[GE users] Jobs remaining in d state

Jean-Paul Minet minet at cism.ucl.ac.be
Mon May 8 08:23:27 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Regularly, I see jobs deleted by users (qdel) remaining in the d state.  For 
example, I have in the qmaster message file:

05/05/2006 14:12:55|qmaster|lmsp|I|hermet has registered the job 11025 for deletion

and three days later, qstat shows

11025 0.00581 run.para hermet  dr 05/05/2006 09:40:43 all.q at lmexec-82 


There is no user process left running on the mpich head/master node nor on 
child/slave nodes.  On the head node, the rsh link and machine file generated by 
the startmpi.sh script have been removed from the /tmp/11025.1.all.q directory, 
but a qrsh_client_cache file remains there.

Any clue of where to look for additional info (what prevents SGE from completing 
job deletion) ?

Thanks

Jean-Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list