Christer Jacobsson | 17 Feb 2012 01:05
Picon

Misbehaving WU?

As Salaam Aleikum!

My boinc2 Seti <at> home app seems to have trouble with the following WU as listed in the JBSWU_Monitor display below:

$<begin>

Credits as of 02-15-12 16:26:50
                         Total Average
GAEA              :   38028.81    1.03
BoincHost 6270800 :    9776.35    1.03

IP Host name/addr :  localhost
Queued WU's       :       0

Next benchmark    :   1 day   23:21:20

Slot/CPU: 0         Project: SETI <at> home
WU: 25no11ak.3420.3753.9.10.189
Percentage done   :    0.00%
CPU time          :  115844   32:10:44
Time to go (Frm 2): 1024990  284:43:10
Est. Total time   : 1024990  284:43:10

$<end>

Look at the CPU figures for the above WU - it will take 284 hours to complete. But here is my problem: As soon as I
shutdown the S <at> H appa, whether by doing an 'boinc_cmd --quit' or by a controlled reboot, the processing of
the WU will NOT continue from the point it was interrupted but rather the S <at> H will *restart* the WU
processing from the very beginning again, meaning that all the work done at the point of the shutdown will
have been lost. So far I haven't managed to have my box going unbooted for the whole 284 hours that's needed
(Continue reading)

Bob | 17 Feb 2012 06:11
Picon
Favicon

Re: Misbehaving WU?

** Reply to message from "Christer Jacobsson cribo.jacobsson-at-gaea.se" on
Fri, 17 Feb 2012 00:05:55 -0000

> Slot/CPU: 0         Project: SETI <at> home
> WU: 25no11ak.3420.3753.9.10.189
> Percentage done   :    0.00%
> CPU time          :  115844   32:10:44
> Time to go (Frm 2): 1024990  284:43:10
> Est. Total time   : 1024990  284:43:10

I don't know where the "Time to go" is (I think it may be from  JBSWU_Monitor
display which I do not use).  I looked in
"sched_request_setiathome.berkeley.edu.xml" which contains a
"cpu_time_remaining" for each work unit, I would look at what is there.  If the
time remaining is that long I think I would stop BOINC and delete it from
projects\setiathome.berkeley.edu then restart BOINC.

On my system the longest time remaining is 77888 which is 21 hours.  I think
that that time is multiplied by some number and is used as time out value to
abort the work unit if it gets into a loop.  I have had work units time out
after a few days.  So your time to go may be the time out value.

------------------------------------

John Small | 17 Feb 2012 14:02

Re: Misbehaving WU?

** Reply to message from "Christer Jacobsson" <cribo.jacobsson <at> gaea.se> on Fri,
17 Feb 2012 00:05:55 -0000

>As Salaam Aleikum!
>
>My boinc2 Seti <at> home app seems to have trouble with the following WU as listed
>in the JBSWU_Monitor display below:
>
>$<begin>
>
>Credits as of 02-15-12 16:26:50
>                         Total Average
>GAEA              :   38028.81    1.03
>BoincHost 6270800 :    9776.35    1.03
>
>IP Host name/addr :  localhost
>Queued WU's       :       0
>
>Next benchmark    :   1 day   23:21:20
>
>
>
>Slot/CPU: 0         Project: SETI <at> home
>WU: 25no11ak.3420.3753.9.10.189
>Percentage done   :    0.00%
>CPU time          :  115844   32:10:44
>Time to go (Frm 2): 1024990  284:43:10
>Est. Total time   : 1024990  284:43:10
>
>$<end>
(Continue reading)


Gmane