Re(2): [GE users] slot taken no job running

Don Shesnicky dshesnicky at
Tue Aug 17 18:21:08 BST 2004


> Did you had a look at the messages file of the queue master in
> I would suggest to shut down the execd on, then go to
the spool directory of this 
> node $SGE_ROOT/default/spool/ and look inside the
directories: active_jobs, jobs, 
> job_scripts and remove all what's ever inside (you can also have a
look there before you shut down 
> anything, just to see, whether there is more than one job mentioned at
all, otherwise we have to look 
> somewhere else). Then restart the execd and we will see, whether it's

I've looked in the qmaster messages file and only this stands out:

08/16/2004 09:40:27|qmaster|canter|E|error writing object with key
   into berkeley database: (28) No space left on device
08/16/2004 09:43:11|qmaster|canter|W|job 8062.1 failed on host general assumedly 
   before job because: can't create directory active_jobs/8062.1: No
space left on device

I take it that job 8062.1 is the problem and is probably stuck in the
berkeley db.

I've totally removed the kane directory in $SGE_ROOT/default/spool/kane
and re-installed it as 
an exec host but that did nothing, it's still showing 1/2 slots full
while no jobs are running on it.

d.norm at        BIP   1/2       0.14     lx24-x86      o

job-ID  prior   name       user         state submit/start at     queue
slots ja-task-ID 
   8591 0.56000 tc_049_ab_ mllalami     r     08/17/2004 12:31:08
d.norm at          1        
   8592 0.56000 tc_049_ab_ mllalami     r     08/17/2004 12:34:19
d.norm at          1        
   8589 0.56000 tc_037_ei_ jrusmussen   r     08/17/2004 12:30:02
d.norm at           1        


To unsubscribe, e-mail: users-unsubscribe at
For additional commands, e-mail: users-help at

More information about the gridengine-users mailing list