[GE users] Error message: Can't read usage file

Petra Kogel Petra.Kogel at ecmwf.int
Wed Aug 22 16:22:06 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Harald,

thanks for this; I'll pursue this.

Kind regards,
Petra

Harald Pollinger wrote:
> To reproduce this error, just "kill -9" the sge_shepherd of the job.
> Then it has no chance to write the usage file and the execd will vainly 
> search for it.
> 
> So my guess is: The sge_shepherd dies and could leave a core dump if 
> your system is configured this way.
> 
> Regards,
> Harald
> 
> 
> Petra Kogel wrote:
>> Hi,
>>
>> from time to time, we have jobs "disappearing" without leaving an output
>> or error file. These jobs run fine if re-submitted. When they do not
>> work
>>
>> - they execute our custom prolog, leaving a start time stamp in
>>   our custom log
>> - they execute our custom epilog, leaving an end time stamp in
>>   our custom log
>> - they log an error on the node's local message file, for example
>>
>> 08/19/2007 07:01:12|execd|bee-ge08|E|can't open usage file 
>> "active_jobs/1882417.1/usage" for job 1882417.1: No such file or 
>> directory
>>
>> 08/19/2007 07:01:12|execd|bee-ge08|E|can't read usage file for job 
>> 1882417.1
>>
>> - they log an error in the qmaster messages file, for example
>>
>> 08/19/2007 07:01:12|qmaster|swarm-ge|W|job 1882417.1 failed on host 
>> bee-ge08 assumedly after job because: can't read usage file for job 
>> 1882417.1
>>
>> For these "disappearing jobs", the time difference between start
>> and end as logged by prolog/epilog is usually one second (if that,
>> sometimes both timestamps are the same). Normally, these jobs
>> would take several minutes to execute and complete.
>>
>> Would anybody know what could provoke this error message / what
>> could be happening to the jobs?
>>
>> Our installation is sge6.0u8 on a SuSE linux cluster.
>>
>> Many thanks for your help,
>>
>> Petra
>>
>>
>>
> 
> 

-- 

Petra Kogel, Senior Systems Analyst, Servers & Desktops Section
European Centre for Medium-Range Weather Forecasts (ECMWF)
Shinfield Park, Reading, Berkshire, RG2 9AX, UK (http://www.ecmwf.int)
Email: pkogel at ecmwf.int Telephone: (++44) 118 9499364

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list