Tim Starling | 1 Mar 02:58 2009

Re: possible revision comparison optimization with diff3?

tlg wrote:
> Hello, I run a sort of semi busy wiki, and I have been experiencing
> difficulties with its CPU load lately, with load jumping to as high as 140
> at noon (not 1.4, not 14, but ~140). Obviously this brought the site to a
> crawl. After investigation I have found the course- multiple diff3
> comparisons were called at the same time.
> To explain the cause of this needs a little background explanation. The wiki
> I run deals with the edit of large text files. It is common to see pages
> with hundreds of kb of pure text on any given wiki page. Normally my servers
> would be able to handle the edit requests of these pages.
> However, it seems that searchbots/crawlbots (from both search engines and
> individual users) have been hitting my wiki pretty hard lately. Each of
> these bots tries to copy all the pages, this include Revision History of
> each of these 100kb sized wiki text pages. Since each page could have
> potentially hundreds of edits, for every single large text files, hundreds
> of Revision history diff (from lighttpd/apache -> php5 -> diff3? ) are
> spawned.

diff3 is invoked in two cases: on page save when there is an edit
conflict, and when someone clicks "undo". Neither is particularly
vital to the operation of the wiki, so the first thing you should do
is turn them both off, using

$wgDiff3 = false;

in LocalSettings.php. Then see if that fixes your load problems. If it
does, then you were right about diff3 being the problem. Next you
should look at your logs to find out where the edits or undo requests
(Continue reading)