[GE users] Fwd: subnode with empty slots but jobs in queue

jlforrest jlforrest at berkeley.edu
Mon Dec 6 19:12:27 GMT 2010


On 12/6/2010 11:02 AM, reuti wrote:
>
> Was the node only rebooted, or also the local spool directory of SGE
> removed? When the local spool directory exists after the reboot, the
> execd would inform the qmaster about the failed jobs. When there is
> no information on the node about the last running jobs, the execd
> won't tell anything to the qmaster, and on its own it's waiting for
> the jobs to reappear.

This is a Rocks cluster so after the node
crashed it was reinstalled from scratch. This
removed the local spool directory, which would
explain my problem. In fact, from what you say,
this would happen whenever a Rocks node
is reinstalled if there were running SGE
job when the node crashed, right?

I'm going to manually remove the bogus
jobs.

As always, thanks for your help. You deserve
a Nobel Prize.

-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302535

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list