Edward Chernenko | 7 Sep 2006 18:28
Picon
Gravatar

Re: About long query/queries

2006/9/7, Platonides <platonides <at> gmail.com>:
> What about doing it locally with a dump? It seems much more efficient to me.

Good idea but I think that dump should be placed outside my account:
1. other users can use it for tasks which doesn't require making
complex SQL queries; 2. I have 256 Mb disk quota while ruwiki dump is
about 400 Mb.

2006/9/7, Gregory Maxwell <gmaxwell <at> gmail.com>
> Are you talking about a query that will be run once or a query that
> will be executed from a cgi script.
No, that will be run manually (or using cron - one time per day).

> select page_namespace, page_title from page;  on ruwiki_p takes under
> a second... I wouldn't call that a long query.

Not all rows of result are fetched right after executing the query.
Normal 'mysql' application receives all rows, prints it and exits. My
application need (after getting one row of result) to:

 1. make one more sql query: fetch page text
SELECT old_text, old_flags FROM text WHERE old_id = (SELECT rev_text
FROM revision WHERE rev_id = ? )
 (where '?' is page_latest from first query)
 2. uncompress text if there is 'gzip' in old_flags.
3. analyze text (that's fast, we can ignore this step).

As you can see, there is a small pause between fetching rows of result
from first query. If this pause is only 0.05 seconds, the first query
will be finished after ~ 83 minutes (for 100000 articles of ruwiki).
(Continue reading)

Gregory Maxwell | 7 Sep 2006 18:45
Picon
Gravatar

Re: About long query/queries

Text is not in the database on toolserver, thus you can't grab the text.

On 9/7/06, Edward Chernenko <edwardspec <at> gmail.com> wrote:
> 2006/9/7, Platonides <platonides <at> gmail.com>:
> > What about doing it locally with a dump? It seems much more efficient to me.
>
> Good idea but I think that dump should be placed outside my account:
> 1. other users can use it for tasks which doesn't require making
> complex SQL queries; 2. I have 256 Mb disk quota while ruwiki dump is
> about 400 Mb.
>
>
> 2006/9/7, Gregory Maxwell <gmaxwell <at> gmail.com>
> > Are you talking about a query that will be run once or a query that
> > will be executed from a cgi script.
> No, that will be run manually (or using cron - one time per day).
>
> > select page_namespace, page_title from page;  on ruwiki_p takes under
> > a second... I wouldn't call that a long query.
>
> Not all rows of result are fetched right after executing the query.
> Normal 'mysql' application receives all rows, prints it and exits. My
> application need (after getting one row of result) to:
>
>  1. make one more sql query: fetch page text
> SELECT old_text, old_flags FROM text WHERE old_id = (SELECT rev_text
> FROM revision WHERE rev_id = ? )
>  (where '?' is page_latest from first query)
>  2. uncompress text if there is 'gzip' in old_flags.
> 3. analyze text (that's fast, we can ignore this step).
(Continue reading)

Platonides | 7 Sep 2006 19:44
Picon

Re: About long query/queries

From: "Edward Chernenko" <edwardspec <at> gmail.com>
Sent: Thursday, September 07, 2006 6:28 PM
Subject: Re: [Toolserver-l] About long query/queries

> 2006/9/7, Platonides
>> What about doing it locally with a dump? It seems much more efficient to 
>> me.
>
> Good idea but I think that dump should be placed outside my account:
> 1. other users can use it for tasks which doesn't require making
> complex SQL queries; 2. I have 256 Mb disk quota while ruwiki dump is
> about 400 Mb.
>
Uh? I was thinking in doing it on your computer, not neccesarily on the 
toolserver. Thus you can control everything on it. About placing it, well, 
it's a public download :P 

Edward Chernenko | 7 Sep 2006 20:24
Picon
Gravatar

Re: About long query/queries

2006/9/7, Platonides <platonides <at> gmail.com>:
> Uh? I was thinking in doing it on your computer, not neccesarily on the
> toolserver. Thus you can control everything on it. About placing it, well,
> it's a public download :P

If I only could spend money on 400 Mb traffic per day I'd never use
Toolserver at all...

--

-- 
Edward Chernenko <edwardspec <at> gmail.com>

Platonides | 8 Sep 2006 23:28
Picon

Re: About long query/queries

If i understood well your fist email, you need to check  current version of 
all ruwiki pages not necessarily the 'last minute' one, as a once check.
You can download the ruwiki dump (last dump is of a month ago, but surely a 
new one will done shortly), which is 87.2 MB (102.8 MB if you also want 
discussion and user pages).

So you download it, have your computer running for two centuries ;-) 
measuring it and, at last, upload the results.
Even if you update it once a month, doesn't seem so exhaustive...

Are you connecting by 56k? 


Gmane