[GE users] Invisible Jobs

Rayson Ho raysonho at eseenet.com
Thu May 12 18:09:40 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

What does "qacct -j <job id>" show??

Also, by looking at the qacct output, you can find the execution host the
job runs on, and then you can examine the log file of the execd on that
host.

Rayson



>Using 6.0u3. We've had a lot of NFS craziness and a server loss due to a
>power failure and other craziness lately. I just realized that once jobs
are
>submitted using qsub and are running, that they become invisible to qmon
and
>to qstat. What I thought was a completely idle cluster is actually a
beehive
>of invisible jobs running.
>
>qstat -f shows the queue instances and their loading, but not the jobs.
>
>Since things like queue subordination and slots are being ignored, our
nodes
>are getting way oversubscribed. So many of the jobs currently running
will
>run for a LONG time.
>
>Any idea what typically causes this to happen? How to prevent it?
>
>Any suggestion in laymen's terms what to ask the IT folks to do to fix
it?
>Hopefully once and for all!
>
>Any way to make these jobs visible again and controllable by qmon?
>
>Or do we just have to wait for all jobs to complete (or go to all the
>individual nodes and kill them manually?) and then reboot everything?
>
>Thanks!
>Jim Marconnet
---------------------------------------------------------
Get your FREE E-mail account at http://www.eseenet.com !

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list