Re(2): [GE users] slot taken no job running

Don Shesnicky dshesnicky at enqsemi.com
Tue Aug 17 18:21:08 BST 2004


 

> Did you had a look at the messages file of the queue master in
$SGE_ROOT/default/spool/qmaster/messages?
> 
> I would suggest to shut down the execd on kane.enqsemi.com, then go to
the spool directory of this 
> node $SGE_ROOT/default/spool/kane.enqsemi.com and look inside the
directories: active_jobs, jobs, 
> job_scripts and remove all what's ever inside (you can also have a
look there before you shut down 
> anything, just to see, whether there is more than one job mentioned at
all, otherwise we have to look 
> somewhere else). Then restart the execd and we will see, whether it's
gone.

I've looked in the qmaster messages file and only this stands out:

08/16/2004 09:40:27|qmaster|canter|E|error writing object with key
"EXECHOST:kane.enqsemi.com" 
   into berkeley database: (28) No space left on device
08/16/2004 09:43:11|qmaster|canter|W|job 8062.1 failed on host
kane.enqsemi.com general assumedly 
   before job because: can't create directory active_jobs/8062.1: No
space left on device

I take it that job 8062.1 is the problem and is probably stuck in the
berkeley db.

I've totally removed the kane directory in $SGE_ROOT/default/spool/kane
and re-installed it as 
an exec host but that did nothing, it's still showing 1/2 slots full
while no jobs are running on it.


------------------------------------------------------------------------
----
d.norm at kane.enqsemi.com        BIP   1/2       0.14     lx24-x86      o
------------------------------------------------------------------------
----

job-ID  prior   name       user         state submit/start at     queue
slots ja-task-ID 
------------------------------------------------------------------------
-----------------------------------------
   8591 0.56000 tc_049_ab_ mllalami     r     08/17/2004 12:31:08
d.norm at dexter.enqsemi.com          1        
   8592 0.56000 tc_049_ab_ mllalami     r     08/17/2004 12:34:19
d.norm at dexter.enqsemi.com          1        
   8589 0.56000 tc_037_ei_ jrusmussen   r     08/17/2004 12:30:02
d.norm at forge.enqsemi.com           1        




Don


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list