[GE users] Error message: Can't read usage file

Petra Kogel Petra.Kogel at ecmwf.int
Wed Aug 22 11:15:07 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

from time to time, we have jobs "disappearing" without leaving an output
or error file. These jobs run fine if re-submitted. When they do not
work

- they execute our custom prolog, leaving a start time stamp in
   our custom log
- they execute our custom epilog, leaving an end time stamp in
   our custom log
- they log an error on the node's local message file, for example

08/19/2007 07:01:12|execd|bee-ge08|E|can't open usage file 
"active_jobs/1882417.1/usage" for job 1882417.1: No such file or directory

08/19/2007 07:01:12|execd|bee-ge08|E|can't read usage file for job 1882417.1

- they log an error in the qmaster messages file, for example

08/19/2007 07:01:12|qmaster|swarm-ge|W|job 1882417.1 failed on host 
bee-ge08 assumedly after job because: can't read usage file for job 
1882417.1

For these "disappearing jobs", the time difference between start
and end as logged by prolog/epilog is usually one second (if that,
sometimes both timestamps are the same). Normally, these jobs
would take several minutes to execute and complete.

Would anybody know what could provoke this error message / what
could be happening to the jobs?

Our installation is sge6.0u8 on a SuSE linux cluster.

Many thanks for your help,

Petra



-- 

Petra Kogel, Senior Systems Analyst, Servers & Desktops Section
European Centre for Medium-Range Weather Forecasts (ECMWF)
Shinfield Park, Reading, Berkshire, RG2 9AX, UK (http://www.ecmwf.int)
Email: pkogel at ecmwf.int Telephone: (++44) 118 9499364

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list