[GE users] job in Rq state never runs

King, Stefan sking at sepaton.com
Wed Mar 29 01:09:48 BST 2006


I need some ideas to try.

 

The job is a DRMAA binary and ran a 2 hours ago, since then, it will
not.

 

in qmaster messages there is:

03/28/2006 18:34:01|qmaster|node0|W|job 7.1 failed on host node0
rescheduling because: 03/28/2006 18:34:00 [0:21513]: exit_status of job
start = 99

 

simple.sh can be submitted and will run

 

uptime

18:37:39  up  7:17,  3 users,  load average: 3.06, 3.06, 3.01

 

using qconf -mq all.q

 

I set 

 

load_thresholds       np_load_avg=10

 

and restarted the master, queue, and exec daemons, because

it was at 1.75.

 

qstat just shows Rq .

 

I have qdel'd it and resubmitted a few times.  

 

qacct -j 

shows that it failed for reason 25, rescheduling

exit status 99

 

A similar thing happened in January and was found to be that the binary
was not

resolvable, but this does not seem to be the case now.

 

SGE version is 6.02u4

 

Any ideas?

 

Stefan

 

 




More information about the gridengine-users mailing list