[GE users] Decoding "failed" messages

Iwona Sakrejda isakrejda at lbl.gov
Tue Apr 12 00:28:56 BST 2005



Reuti wrote:
> Quoting Iwona Sakrejda <isakrejda at lbl.gov>:
> 
> 
>>
>>Ron Chen wrote:
>>
>>
>>>Was it killed by the user directly (outside of SGE)?
>>>
>>
>>The user thinks he did not do it and he has no direct
>>access to the batch node, however under some circumstances he
>>might be able to run two jobs on the same host, so one job
>>could kill the other in principle. However he swears that
> 
> 
> Why is it a problem for your application to run two times on the same node - 
> can this be adjusted? Did the job exceed any requested limits? - Reuti
> 
There is no problems for the application to run twice  on the same node
and the job did not exceed any limits as far as I can tell. I mentioned the two job
scenario, because that would be the only way when a user can kill a process
belonging to a job from outside of that job.  That was a reply to the question whether
the user could kill a process directly (and not by killing the job).

Actually I see this message for different users. I have about 44k entries
in the messages file and 1.5k of them are about killing "assumedly after job".

Iwona

> 
>>in none of his jobs any kill is issued.
>>
>>Iwona
>>
>>
>>
>>> -Ron
>>>
>>>
>>>--- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
>>>
>>>
>>>>Hi,
>>>>
>>>>Where can I look up error codes shown after the
>>>>"failed" entry
>>>
>>>>from qacct -j?
>>>
>>>>Some of the jobs are failing with a message:
>>>>"failed       100 : assumedly after job"
>>>>
>>>>I was trying to find some more info about them and
>>>>the qmaster/messages file
>>>>has the following entry for this job:
>>>>
>>>>Tue Apr  5 12:29:51 2005|qmaster|pdsfcore03|W|job
>>>>404796.1 failed on host pc2515.nersc.gov 
>>>>assumedly after job because: job 404796.1 died
>>>>through signal KILL (9)
>>>>
>>>>That did not explain much either. The job runs just
>>>>fine if resubmitted.
>>>>How to figure out why are those jobs dying?
>>>>
>>>>Any assistance in sorting this out would be
>>>>appreciated.....
>>>>
>>>>Iwona Sakrejda
>>>>
>>>>
>>>>
>>>
>>>---------------------------------------------------------------------
>>>
>>>
>>>>To unsubscribe, e-mail:
>>>>users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail:
>>>>users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>>
>>>
>>>		
>>>__________________________________ 
>>>Do you Yahoo!? 
>>>Yahoo! Small Business - Try our new resources site!
>>>http://smallbusiness.yahoo.com/resources/
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list