[GE users] SGE-6.X: Admin jobs on empty hosts.

Reuti reuti at staff.uni-marburg.de
Thu May 8 08:59:18 BST 2008


Hi,

Am 08.05.2008 um 09:52 schrieb Erik Soyez:

> I have a very basic request (one should think):
>
> I want to run reboot jobs via SGE on all hosts that have no jobs
> running.
>
> Our cluster (~100 hosts, ~300 cpus, ~10 departments) has a rather
> complex configuration with subordinate queues for some hostgroups,
> suspend_thresholds for others, some have different queues with
> different numbers of slots, less slots than cpus on workstations,
> more slots than cpus in cases of suspend-setups, they even have
> guest queues with less slots and different priorities etc.  Very
> inhomogeneous.
>
> I could set up some loadsensor with a "nojobs" requestable ressource,
> but that seems a bit overkill for such a simple task and it's not
> waterproof at all due to race conditions.
>
> Even if I had a specific admin queue I could not be sure that there
> weren't any other jobs running in different queues.  Requesting a
> load of 0.0 wouldn't help either, that happens sometimes with
> parallel jobs waiting for slower nodes.
>
> On other clusters which have only one queue on each host it always
> works fine by requesting all slots of each host and a PE with
> $pe_slots.  But on this one I haven't had any practical ideas yet.

do you avoid oversubscription with an RQS or setting in the  
exechost's configuration the available slots to the installed number  
of cores? Than it should work in the same way even with many queues  
on each machine..

-- Reuti

> Any ideas for a simple and reliable reboot (and not only reboot....)
> method?
>
> Many many thanks!
>
> Erik Soyez.
>
>
> --
>
> -- 
> Vorstand/Board of Management:
> Dr. Bernd Finkbeiner, Dr. Florian Geyer,
> Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Prof. Dr. Hanns Ruder
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list