[GE users] Invisible Jobs

Jim Marconnet jmarconnet at knology.net
Thu May 12 16:59:20 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Using 6.0u3. We've had a lot of NFS craziness and a server loss due to a
power failure and other craziness lately. I just realized that once jobs are
submitted using qsub and are running, that they become invisible to qmon and
to qstat. What I thought was a completely idle cluster is actually a beehive
of invisible jobs running.

qstat -f shows the queue instances and their loading, but not the jobs.

Since things like queue subordination and slots are being ignored, our nodes
are getting way oversubscribed. So many of the jobs currently running will
run for a LONG time.

Any idea what typically causes this to happen? How to prevent it?

Any suggestion in laymen's terms what to ask the IT folks to do to fix it?
Hopefully once and for all!

Any way to make these jobs visible again and controllable by qmon?

Or do we just have to wait for all jobs to complete (or go to all the
individual nodes and kill them manually?) and then reboot everything?

Thanks!
Jim Marconnet



-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.11.9 - Release Date: 5/12/2005


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list