[GE users] Job causes queues to go into error state

Iwona Sakrejda isakrejda at lbl.gov
Thu Dec 4 09:49:05 GMT 2008


On 12/3/08 3:05 PM, Iwona Sakrejda wrote:
> Hi,
>
> I have this one user whose jobs are flushing through hosts and pushing 
> queues into error state.
> I cannot figure it out. Here is a snippet from an e-mail generated by 
> suvh a job.
>
> 12/03/2008 14:27:51 [171:13726]: wait3 returned 13727 (status: 32512; 
> WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 127)
> 12/03/2008 14:27:51 [171:13726]: prolog exited with exit status 127
> 12/03/2008 14:27:51 [171:13726]: reaped "prolog" with pid 13727
> 12/03/2008 14:27:51 [171:13726]: prolog exited not due to signal
> 12/03/2008 14:27:51 [171:13726]: prolog exited with status 127
> 12/03/2008 14:27:51 [171:13726]: exit_status of prolog = 127
> 12/03/2008 14:27:51 [171:13726]: no epilog script to start

In the master log there is a message:

> 12/03/2008 15:23:51|qmaster|pc2533|W|job 533211.1 failed on host 
> pc1903.nersc.gov general in prolog because: 12/03/2008 15:23:51 
> [171:12905]: exit_status of prolog = 127
> 12/03/2008 15:23:51|qmaster|pc2533|W|rescheduling job 533211.1
> 12/03/2008 15:23:51|qmaster|pc2533|E|queue all.64bit.q marked QERROR 
> as result of job 533211's failure at host pc1903.nersc.gov
>

> Other jobs are running happily on those nodes.
> Could you suggest where to start looking for the cause?
>
>
> Thanks a lot,
>
> iwona
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=91058

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list