[GE users] Jobs still shown as running after process has died

reuti reuti at staff.uni-marburg.de
Fri Aug 13 13:21:25 BST 2010


Am 13.08.2010 um 13:57 schrieb robhorton:

> On Fri, 2010-08-13 at 12:09 +0200, reuti wrote:
>>> I've got a "live" example at the moment if anyone has any debugging suggestions.
>> 
>> - was the $TMPDIR on the node already removed?
> 
> We don't create per-job a $TMPDIR

But SGE will do for your convenience, and remove it after the job. It looks like /tmp/1234.1.all.q on the nodes, unless you redefined the location in the queue configuration.


>> - was the job's spool directory removed $SGE_ROOT/default/spool/<exechost>/active_jobs (or is it local like /var/spool/<exechost>/active_jobs, which would be better)?
> 
> The spool directory has gone.

I assume, for these tasks also no accounting record was written.


>> - the messages file of the qmaster has no entry also? (loglevel info)
> 
> No, but loglevel was set to warning - I've changed it and will see if I
> can reproduce the error.
> 
>> - was the email send at the end of the job?
>> - the nodes "messages" file contains a note about the email?
> 
> The job didn't request an email.

Maybe you can add this for the next submission. Well, you will get one email per array task.

-- Reuti


> Rob
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274265
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274268

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list