[GE users] Unexpected behavior during simultaneous job submissions?

Jonathan Pierce jonathan.pierce at loni.ucla.edu
Fri Nov 14 01:03:13 GMT 2008

    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi All,

One of our users has a script that submits 494 jobs, which he ran this  
morning; based on qstat, the first job was reported at 09:17:59 and  
the last at 09:18:53. While most are now happily executing, 19 of  
those jobs started out in a "zombie" state, so to speak.  'qstat -j  
[jobID]' returns information, and qstat -f shows the job in state  
'r'.  However, 'qacct -j [jobID]', reports the error job id not found  
(and a quick inspection on the node the job is supposedly running  
confirms nothing is executing).  Has anybody seen this behavior before?

Taking a step back, we've been discovering a number of zombie jobs  
recently, most of which do not originate from this script.  Is this  
behavior indicative of some greater failure?

Thank you very much,

Jonathan Pierce
Systems Administrator
Laboratory of Neuro Imaging, UCLA
635 Charles E. Young Drive South,
Suite 225 Los Angeles, CA 90095-7334
Tel: 310.267.5076
Fax: 310.206.5518
jonathan.pierce at loni.ucla.edu


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list