[GE users] Error message: Can't read usage file
Petra.Kogel at ecmwf.int
Wed Aug 22 16:22:06 BST 2007
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
thanks for this; I'll pursue this.
Harald Pollinger wrote:
> To reproduce this error, just "kill -9" the sge_shepherd of the job.
> Then it has no chance to write the usage file and the execd will vainly
> search for it.
> So my guess is: The sge_shepherd dies and could leave a core dump if
> your system is configured this way.
> Petra Kogel wrote:
>> from time to time, we have jobs "disappearing" without leaving an output
>> or error file. These jobs run fine if re-submitted. When they do not
>> - they execute our custom prolog, leaving a start time stamp in
>> our custom log
>> - they execute our custom epilog, leaving an end time stamp in
>> our custom log
>> - they log an error on the node's local message file, for example
>> 08/19/2007 07:01:12|execd|bee-ge08|E|can't open usage file
>> "active_jobs/1882417.1/usage" for job 1882417.1: No such file or
>> 08/19/2007 07:01:12|execd|bee-ge08|E|can't read usage file for job
>> - they log an error in the qmaster messages file, for example
>> 08/19/2007 07:01:12|qmaster|swarm-ge|W|job 1882417.1 failed on host
>> bee-ge08 assumedly after job because: can't read usage file for job
>> For these "disappearing jobs", the time difference between start
>> and end as logged by prolog/epilog is usually one second (if that,
>> sometimes both timestamps are the same). Normally, these jobs
>> would take several minutes to execute and complete.
>> Would anybody know what could provoke this error message / what
>> could be happening to the jobs?
>> Our installation is sge6.0u8 on a SuSE linux cluster.
>> Many thanks for your help,
Petra Kogel, Senior Systems Analyst, Servers & Desktops Section
European Centre for Medium-Range Weather Forecasts (ECMWF)
Shinfield Park, Reading, Berkshire, RG2 9AX, UK (http://www.ecmwf.int)
Email: pkogel at ecmwf.int Telephone: (++44) 118 9499364
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users