[GE users] ptf complains

Iwona Sakrejda isakrejda at lbl.gov
Thu Aug 17 18:42:22 BST 2006



Iwona Sakrejda wrote:

> Hi,
> 
> Every now and then one of the execution hosts in my cluster "goes bad".
> I cannot find anything wrong with the host itself yet jobs
> keep dying with an exit code of 255 right on the startup and the
> only sign of trouble is the following message:
> 08/17/2006 02:05:21|execd|pc2408|W|reaping job "1337890" ptf complains: 
> Job does not exist
> 
> What is a meaning of this message? Understanding origins
> of this complaint might help me solve my problem.
> 
> Thanks a lot,
> 
> Iwona
> 


I did some more checking and searching of the archives. I discovered that
there were others that had a similar problem with same version as me 6.0u4
but not why or how it was solved.

In the master log I also found entries for this host:
08/15/2006 22:44:02|qmaster|pdsfcore03|E|commlib error: got read error (closing 
"pc2408.nersc.gov/qstat/29343")
(but I am not sure it is related)

And then also in the master log related to those dying jobs:
08/16/2006 23:57:23|qmaster|pdsfcore03|W|job 1337772.1 failed on host pc2408.nersc.gov assumedly 
after job because: job 1337772.1 died through signal BUS (7)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list