[GE users] Decoding "failed" messages

Ron Chen ron_chen_123 at yahoo.com
Mon Apr 11 23:46:38 BST 2005


Was it killed by the user directly (outside of SGE)?

 -Ron


--- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
> Hi,
> 
> Where can I look up error codes shown after the
> "failed" entry
> from qacct -j?
> 
> Some of the jobs are failing with a message:
> "failed       100 : assumedly after job"
> 
> I was trying to find some more info about them and
> the qmaster/messages file
> has the following entry for this job:
> 
> Tue Apr  5 12:29:51 2005|qmaster|pdsfcore03|W|job
> 404796.1 failed on host pc2515.nersc.gov 
> assumedly after job because: job 404796.1 died
> through signal KILL (9)
> 
> That did not explain much either. The job runs just
> fine if resubmitted.
> How to figure out why are those jobs dying?
> 
> Any assistance in sorting this out would be
> appreciated.....
> 
> Iwona Sakrejda
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 


		
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list