17 Mar 2009 15:20
Re: DRMAA2: TERMINATED vs. FAILED state
I think ultimately the purpose here is to be able to tell when a job was killed by a user or administrator or a forced migration or some such. The internal/external explanation captures that best for me. I think the other subtleties of how exactly a job failed should be expressed another way, such as the substate information. Daniel Peter Tröger wrote: > Dear all, > > this discussion thread is intended to finalize the discussion about job > states after execution end in DRMAA2. > In DRMAA1, there is only the FAILED state, expressing that the job was > running but did not finish successfully for some reason. Piotr proposed > a separation between FAILED and TERMINATED jobs: > > http://www.ogf.org/pipermail/drmaa-wg/2009-January/000985.html > > We meanwhile had different proposals regarding this idea: > > Option 1) > TERMINATED state = resubmission might help, > FAILED state = resubmission unlikely to help (machine problem, > misconfiguration) > > Option 2) > TERMINATED state = triggered by an external entity, > FAILED state = job terminated by itself >(Continue reading)
RSS Feed