[GE users] job in Rq state never runs

Rayson Ho rayrayson at gmail.com
Wed Mar 29 01:11:50 BST 2006


    [ The following text is in the "WINDOWS-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Seems like the job exited with 99, and exit status 99 has a special
meaning in SGE -- it means requeue the job...

Rayson



On 3/28/06, King, Stefan <sking at sepaton.com> wrote:
>
>
>
> I need some ideas to try.
>
>
>
> The job is a DRMAA binary and ran a 2 hours ago, since then, it will not.
>
>
>
> in qmaster messages there is:
>
> 03/28/2006 18:34:01|qmaster|node0|W|job 7.1 failed on host node0
> rescheduling because: 03/28/2006 18:34:00 [0:21513]: exit_status of job
> start = 99
>
>
>
> simple.sh can be submitted and will run
>
>
>
> uptime
>
> 18:37:39  up  7:17,  3 users,  load average: 3.06, 3.06, 3.01
>
>
>
> using qconf ?mq all.q
>
>
>
> I set
>
>
>
> load_thresholds       np_load_avg=10
>
>
>
> and restarted the master, queue, and exec daemons, because
>
> it was at 1.75.
>
>
>
> qstat just shows Rq .
>
>
>
> I have qdel'd it and resubmitted a few times.
>
>
>
> qacct ?j
>
> shows that it failed for reason 25, rescheduling
>
> exit status 99
>
>
>
> A similar thing happened in January and was found to be that the binary was
> not
>
> resolvable, but this does not seem to be the case now.
>
>
>
> SGE version is 6.02u4
>
>
>
> Any ideas?
>
>
>
> Stefan
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list