[GE users] Jobs in "hold" state disappear. Debugging help?
gutnik at gmail.com
Wed Mar 24 21:11:47 GMT 2010
On Wed, Mar 24, 2010 at 1:24 PM, reuti <reuti at staff.uni-marburg.de> wrote:
> Am 24.03.2010 um 21:18 schrieb gutnik:
>> I'm using a cad tool (cadence) that is somewhat integrated with SGE.
>> In one situation, it launches a number of simulations, and one
>> "cleanup" job. I see the simulation jobs submitted, and I see the
>> cleanup job submitted with a hold that depends on the simulation
>> jobs. Great.
>> What I see is that the simulations finish, and about 30 seconds
>> later, the cleanup job is removed from the queue without ever being
>> run. So, I have two questions:
> Removed from `qstat` or within Cadence?
I believe the job is removed by SGE. I don't think Cadence removes it; if there
were a log I could see, I could confirm that.
>> 2) Why is the job being removed? One possibility (from the manual) is
>> that the simulations are exiting with code 100. Is that
> The it's not removed but the job put in error state.
> To investigate this you can use:
> $ qstat -s z
qstat -s z
lists the job.
> $ qacct -j <job_od_of_cleaner>
"error: jobid 1234 not found"
if the job of the cleaner was 1234.
> You can even define in a ~/.sge_request file that you can an email for
> each started/ended/aborted job (which will be attached to each job
> even when you have no chance to enter such option in your application).
How do I do that? (Ideally, I'd like email for each job, and each change of
status and reason.)
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users