[GE users] Jobs in "hold" state disappear. Debugging help?

reuti reuti at staff.uni-marburg.de
Wed Mar 24 21:19:47 GMT 2010


Am 24.03.2010 um 22:11 schrieb gutnik:

> On Wed, Mar 24, 2010 at 1:24 PM, reuti <reuti at staff.uni-marburg.de>  
> wrote:
>> Am 24.03.2010 um 21:18 schrieb gutnik:
>>
>>> I'm using a cad tool (cadence) that is somewhat integrated with SGE.
>>> In one situation, it launches a number of simulations, and one
>>> "cleanup" job. I see the simulation jobs submitted, and I see the
>>> cleanup job submitted with a hold that depends on the simulation
>>> jobs. Great.
>>>
>>> What I see is that the simulations finish, and about 30 seconds
>>> later, the cleanup job is removed from the queue without ever being
>>> run. So, I have two questions:
>>
>> Removed from `qstat` or within Cadence?
>
> I believe the job is removed by SGE. I don't think Cadence removes  
> it; if there
> were a log I could see, I could confirm that.

Okay, the let's have a look at the messages file of the qmaster:  
$SGE_ROOT/default/spool/messages (or a local spool directory if  
configured) Any hint of a `qdel`?

>>> 2) Why is the job being removed? One possibility (from the manual)  
>>> is
>>> that the simulations are exiting with code 100. Is that
>>
>> The it's not removed but the job put in error state.
>>
>> To investigate this you can use:
>>
>> $ qstat -s z
>
> qstat -s z
> lists the job.
>
>> $ qacct -j <job_od_of_cleaner>
>
> "error: jobid 1234 not found"
> if the job of the cleaner was 1234.
>
>> You can even define in a ~/.sge_request file that you can an email  
>> for
>> each started/ended/aborted job (which will be attached to each job
>> even when you have no chance to enter such option in your  
>> application).
>
> How do I do that? (Ideally, I'd like email for each job, and each  
> change of
> status and reason.)

Just put a line.

-m bea

into this file and hope that proper email handling were setup. With -M  
an optional target address different from the local user could be  
specified.

-- Reuti

>
>   Vadim
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=251212
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=251213

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list