[GE users] FAQ Problem With A New Twist

Langston, Chris Chris.Langston at aa.com
Sat Mar 17 17:17:25 GMT 2007


Hi Reuti,

These are new servers and we have disk monitoring to alert us if the
file system fills up, so I don't think that would be a cause. I'm not
sure if it's always on the same node but that's a good thing to check. I
can't image any process that would be killing the shephered process but
it's always a possibility. I'm not sure how to verify that. Now, with
your help, that I have a better understanding of how it's getting the
return code, maybe I can put in watches to see what might be
interfering.

Thanks for your help,
Chris

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Saturday, March 17, 2007 9:09 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] FAQ Problem With A New Twist

Hi,

Am 16.03.2007 um 19:14 schrieb Langston, Chris:
> We are having an intermittent issue with jobs submitted with qrsh.  
> We are getting an error message: "error: error reading returncode  
> of remote command"  when submitting jobs with qrsh.
the qrsh_starter on the node tries to write there the files  
qrsh_exit_code and qrsh_error. So: is this happening always on the  
same nodes with e.g. a full disk? Is there any chance that the  
shephered is killed by any program before the started task exits?

-- Reuti
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list