[GE users] Error message: Can't read usage file

Harald Pollinger Harald.Pollinger at Sun.COM
Wed Aug 22 15:37:03 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

To reproduce this error, just "kill -9" the sge_shepherd of the job.
Then it has no chance to write the usage file and the execd will vainly 
search for it.

So my guess is: The sge_shepherd dies and could leave a core dump if 
your system is configured this way.

Regards,
Harald


Petra Kogel wrote:
> Hi,
> 
> from time to time, we have jobs "disappearing" without leaving an output
> or error file. These jobs run fine if re-submitted. When they do not
> work
> 
> - they execute our custom prolog, leaving a start time stamp in
>   our custom log
> - they execute our custom epilog, leaving an end time stamp in
>   our custom log
> - they log an error on the node's local message file, for example
> 
> 08/19/2007 07:01:12|execd|bee-ge08|E|can't open usage file 
> "active_jobs/1882417.1/usage" for job 1882417.1: No such file or directory
> 
> 08/19/2007 07:01:12|execd|bee-ge08|E|can't read usage file for job 
> 1882417.1
> 
> - they log an error in the qmaster messages file, for example
> 
> 08/19/2007 07:01:12|qmaster|swarm-ge|W|job 1882417.1 failed on host 
> bee-ge08 assumedly after job because: can't read usage file for job 
> 1882417.1
> 
> For these "disappearing jobs", the time difference between start
> and end as logged by prolog/epilog is usually one second (if that,
> sometimes both timestamps are the same). Normally, these jobs
> would take several minutes to execute and complete.
> 
> Would anybody know what could provoke this error message / what
> could be happening to the jobs?
> 
> Our installation is sge6.0u8 on a SuSE linux cluster.
> 
> Many thanks for your help,
> 
> Petra
> 
> 
> 


-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list