[GE users] SGE-6.X: Admin jobs on empty hosts.

Erik Soyez E.Soyez at science-computing.de
Thu May 8 08:52:41 BST 2008


Good morning list,

I have a very basic request (one should think):

I want to run reboot jobs via SGE on all hosts that have no jobs
running.

Our cluster (~100 hosts, ~300 cpus, ~10 departments) has a rather
complex configuration with subordinate queues for some hostgroups,
suspend_thresholds for others, some have different queues with
different numbers of slots, less slots than cpus on workstations,
more slots than cpus in cases of suspend-setups, they even have
guest queues with less slots and different priorities etc.  Very
inhomogeneous.

I could set up some loadsensor with a "nojobs" requestable ressource,
but that seems a bit overkill for such a simple task and it's not
waterproof at all due to race conditions.

Even if I had a specific admin queue I could not be sure that there
weren't any other jobs running in different queues.  Requesting a
load of 0.0 wouldn't help either, that happens sometimes with
parallel jobs waiting for slower nodes.

On other clusters which have only one queue on each host it always
works fine by requesting all slots of each host and a PE with
$pe_slots.  But on this one I haven't had any practical ideas yet.

Any ideas for a simple and reliable reboot (and not only reboot....)
method?

Many many thanks!

Erik Soyez.


--

-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list