[GE users] Decoding "failed" messages

Iwona Sakrejda isakrejda at lbl.gov
Mon Apr 11 22:55:59 BST 2005


Hi,

Where can I look up error codes shown after the "failed" entry
from qacct -j?

Some of the jobs are failing with a message:
"failed       100 : assumedly after job"

I was trying to find some more info about them and the qmaster/messages file
has the following entry for this job:

Tue Apr  5 12:29:51 2005|qmaster|pdsfcore03|W|job 404796.1 failed on host pc2515.nersc.gov 
assumedly after job because: job 404796.1 died through signal KILL (9)

That did not explain much either. The job runs just fine if resubmitted.
How to figure out why are those jobs dying?

Any assistance in sorting this out would be appreciated.....

Iwona Sakrejda


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list