[GE users] master node selection and $fill_up behaviour

murphygb brian.murphy at siemens.com
Wed Jul 21 11:44:57 BST 2010


> Hello,
> 
> (at least as far as I know and can tell) with SGE 6.0 a job's master
> node would be chosen based on the number of free CPUs (all other
> criteria such as seq_no being equal). In a cluster with 4-CPU machines,
> each 3-slot-fill_up-job would get its own machine as long as there were
> free machines left. Only after that jobs would be started spanning
> hosts.
> 
> With SGE 6.2 this seems to have changed: As long as the cluster is
> completely empty, the old behaviour is still followed. But if there are
> jobs already, slot allocation becomes erratic. SGE seems to prefer
> filling up already used machines before re-using free machines.
> 
> Is this intentional, maybe even a feature?
> Can the old behaviour be reinstated?
> Is there some other explanation?
> 
> A simple example: We start with an empty cluster of eight four-CPU
> machines and submit nine 3-slot fill_up jobs:
> 
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 135 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 136 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 137 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 138 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 139 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 140 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 141 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 142 ("STDIN") has been submitted
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
> Your job 143 ("STDIN") has been submitted
> 
> After that job distribution looks as follows:
> 
> scmic at l5-auto-du ~ $ qhost -j
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
> -------------------------------------------------------------------------------
> global                  -               -     -       -       -       -       -
> l5-auto-du              -               -     -       -       -       -       -
> l5-node01               lx26-amd64      4  0.02    7.9G  370.0M    9.8G     0.0
>    job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID
>    ----------------------------------------------------------------------------------------------
>        141 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
> l5-node02               lx26-amd64      4  0.69    7.9G  339.4M    9.8G     0.0
>        139 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
> l5-node03               lx26-amd64      4  0.03    7.9G  374.5M    9.8G     0.0
>        135 0.55500 STDIN      scmic        r     07/20/2010 15:53:01 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm MASTER
> l5-node04               lx26-amd64      4  4.92    7.9G  955.4M    9.8G     0.0
>        137 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
> l5-node05               lx26-amd64      4  5.11    7.9G  951.3M    9.8G     0.0
>        140 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
> l5-node06               lx26-amd64      4  0.03    7.9G  370.1M    9.8G     0.0
>        136 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
> l5-node07               lx26-amd64      4  0.06    7.9G  381.3M    9.8G     0.0
>        138 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
> l5-node08               lx26-amd64      4  0.02    7.9G  347.8M    9.8G     0.0
>        142 0.55500 STDIN      scmic        r     07/20/2010 15:53:04 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
> 
> Then I free up node 6 by deleting job 136 and submit a new job:
> 
> scmic at l5-auto-du ~ $ qdel --version 6.2u5 136
> scmic has registered the job 136 for deletion
> scmic at l5-auto-du ~ $ echo sleep 100 | qsub --version 6.2u5 -pe dmp 3
> Your job 144 ("STDIN") has been submitted
> 
> I'd expect job 144 to be run on node 6. Instead it ends up distributed over nodes 5, 1 and 2:
> 
> scmic at l5-auto-du ~ $ qhost -j --version 6.2u5
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
> -------------------------------------------------------------------------------
> global                  -               -     -       -       -       -       -
> l5-auto-du              -               -     -       -       -       -       -
> l5-node01               lx26-amd64      4  0.02    7.9G  370.0M    9.8G     0.0
>    job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID
>    ----------------------------------------------------------------------------------------------
>        141 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        144 0.55500 STDIN      scmic        r     07/20/2010 15:53:26 express-dm SLAVE
> l5-node02               lx26-amd64      4  0.69    7.9G  339.4M    9.8G     0.0
>        139 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        144 0.55500 STDIN      scmic        r     07/20/2010 15:53:26 express-dm MASTER
> l5-node03               lx26-amd64      4  0.03    7.9G  374.5M    9.8G     0.0
>        135 0.55500 STDIN      scmic        r     07/20/2010 15:53:01 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm MASTER
> l5-node04               lx26-amd64      4  4.92    7.9G  955.4M    9.8G     0.0
>        137 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
> l5-node05               lx26-amd64      4  5.11    7.9G  951.3M    9.8G     0.0
>        140 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        144 0.55500 STDIN      scmic        r     07/20/2010 15:53:26 express-dm SLAVE
> l5-node06               lx26-amd64      4  0.03    7.9G  370.1M    9.8G     0.0
> l5-node07               lx26-amd64      4  0.06    7.9G  381.3M    9.8G     0.0
>        138 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
>        143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
> l5-node08               lx26-amd64      4  0.02    7.9G  347.8M    9.8G     0.0
>        142 0.55500 STDIN      scmic        r     07/20/2010 15:53:04 express-dm MASTER
>                                                                      express-dm SLAVE
>                                                                      express-dm SLAVE
> Any help would be greatly appreciated.
> 
I see the exact same behavior.  Every config change I have made does not help.  I am forcing the use of multiples of 4 for processor requests via launch scripts to help try and keep the jobs less fragmented.  I need jobs to run on the fewest number of hosts to reduced communication between master and slaves.  So far nothing has worked.
> Thanks in advance,
> -- 
> Michael Weiser                science + computing ag
> Senior Systems Engineer       Geschaeftsstelle Duesseldorf
>                               Martinstrasse 47-55, Haus A
> phone: +49 211 302 708 32     D-40223 Duesseldorf
> fax:   +49 211 302 708 50     www.science-computing.de
> -- 
> Vorstand/Board of Management:
> Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
> Dr. Arno Steitz, Dr. Ingrid Zech
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Michel Lepert
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=269446

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list