[GE users] shepherd of job 388421.1 died through signal = 11

Kelly Felkins kfelkins at lbl.gov
Tue Nov 15 19:47:17 GMT 2005


    11/15/2005 11:30:26|execd|node64t-10|E|shepherd of job 388421.1 died
    through signal = 11


I'm seeing this error in the messages files for specific nodes on our 
cluster. At the moment we have a large array job running, so there are 
similar jobs on nearly every node. A handful of the nodes get this error 
and then the queue goes into error state. If you clear the error, soon 
another task is attempted on the node, which then experiences the same 
error and the queue goes back into error state.

Please help me diagnose this problem.

Thank you.

-Kelly


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list