[GE users] Decoding "failed" messages

Reuti reuti at staff.uni-marburg.de
Tue Apr 12 00:12:12 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Quoting Iwona Sakrejda <isakrejda at lbl.gov>:

> 
> 
> Ron Chen wrote:
> 
> > Was it killed by the user directly (outside of SGE)?
> > 
> The user thinks he did not do it and he has no direct
> access to the batch node, however under some circumstances he
> might be able to run two jobs on the same host, so one job
> could kill the other in principle. However he swears that

Why is it a problem for your application to run two times on the same node - 
can this be adjusted? Did the job exceed any requested limits? - Reuti

> in none of his jobs any kill is issued.
> 
> Iwona
> 
> 
> >  -Ron
> > 
> > 
> > --- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
> > 
> >>Hi,
> >>
> >>Where can I look up error codes shown after the
> >>"failed" entry
> >>from qacct -j?
> >>
> >>Some of the jobs are failing with a message:
> >>"failed       100 : assumedly after job"
> >>
> >>I was trying to find some more info about them and
> >>the qmaster/messages file
> >>has the following entry for this job:
> >>
> >>Tue Apr  5 12:29:51 2005|qmaster|pdsfcore03|W|job
> >>404796.1 failed on host pc2515.nersc.gov 
> >>assumedly after job because: job 404796.1 died
> >>through signal KILL (9)
> >>
> >>That did not explain much either. The job runs just
> >>fine if resubmitted.
> >>How to figure out why are those jobs dying?
> >>
> >>Any assistance in sorting this out would be
> >>appreciated.....
> >>
> >>Iwona Sakrejda
> >>
> >>
> >>
> > 
> > ---------------------------------------------------------------------
> > 
> >>To unsubscribe, e-mail:
> >>users-unsubscribe at gridengine.sunsource.net
> >>For additional commands, e-mail:
> >>users-help at gridengine.sunsource.net
> >>
> >>
> > 
> > 
> > 
> > 		
> > __________________________________ 
> > Do you Yahoo!? 
> > Yahoo! Small Business - Try our new resources site!
> > http://smallbusiness.yahoo.com/resources/
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list