[GE users] Jobs stuck in delete status

rayson rayrayson at gmail.com
Tue Jun 9 17:33:31 BST 2009


As the cluster admin user, issue:

% qdel -f <job id>

Rayson





On 6/9/09, seandavi <seandavi at gmail.com> wrote:
> I'm using 6.2 and have managed to get a couple of jobs stuck in "dr"
> status.  Both were parallel jobs running across multiple machines, but
> both appear to have the "master" task running on the same machine.  I
> have restarted the qmaster and the execd on the machine on which the
> jobs appear to have had the "master" task.  Here is what I have in the
> execd messages file:
>
> 06/09/2009 12:18:46|  main|pressa|I|controlled shutdown 6.2
> 06/09/2009 12:18:53|  main|pressa|I|starting up SGE 6.2 (lx24-amd64)
> 06/09/2009 12:18:53|  main|pressa|W|reaping job "28147" ptf complains:
> Job does not exist
>
> Any ideas as to what is going on or how to go further with diagnosing
> the problem.  The cluster has been up and running for months without
> problems.  The only new addition is openmpi integration; it turns out
> that one of the jobs stuck in "dr" status is an mpirun job.
>
> Thanks,
> Sean
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201328
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201329

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list