[GE users] Jobs remaining in d state

Jean-Paul Minet minet at cism.ucl.ac.be
Mon May 8 12:37:27 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ron,

> If the node the job runs on is not reachable by qmaster, then
> you will encounter that. You can use "qdel -f" to force a
> cleanup.

The node is reachable by qmaster: another job is running on the same (biproc) 
node and its cpu usage is correctly reported/updated through qstat -ext.  To me 
there is another problem somewhere.  Isn't the fact that the qrsh wrapper/link 
and machinefile have been removed from the $TMP directory an indicator that 
something was done in response to the qdel command, but could not be performed 
till completion ?

Jean-paul

>  -Ron
> 
> 
> --- Jean-Paul Minet <minet at cism.ucl.ac.be> wrote:
> 
>>Hi,
>>
>>Regularly, I see jobs deleted by users (qdel) remaining in the
>>d state.  For 
>>example, I have in the qmaster message file:
>>
>>05/05/2006 14:12:55|qmaster|lmsp|I|hermet has registered the
>>job 11025 for deletion
>>
>>and three days later, qstat shows
>>
>>11025 0.00581 run.para hermet  dr 05/05/2006 09:40:43
>>all.q at lmexec-82 
>>
>>
>>There is no user process left running on the mpich head/master
>>node nor on 
>>child/slave nodes.  On the head node, the rsh link and machine
>>file generated by 
>>the startmpi.sh script have been removed from the
>>/tmp/11025.1.all.q directory, 
>>but a qrsh_client_cache file remains there.
>>
>>Any clue of where to look for additional info (what prevents
>>SGE from completing 
>>job deletion) ?
>>
>>Thanks
>>
>>Jean-Paul
>>
>>
> 
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail:
>>users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail:
>>users-help at gridengine.sunsource.net
>>
>>
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 

-- 
Jean-Paul Minet
Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
Université Catholique de Louvain
Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list