[GE users] master node selection and $fill_up behaviour

weiser m.weiser at science-computing.de
Tue Jul 20 15:06:55 BST 2010


Hello,

(at least as far as I know and can tell) with SGE 6.0 a job's master
node would be chosen based on the number of free CPUs (all other
criteria such as seq_no being equal). In a cluster with 4-CPU machines,
each 3-slot-fill_up-job would get its own machine as long as there were
free machines left. Only after that jobs would be started spanning
hosts.

With SGE 6.2 this seems to have changed: As long as the cluster is
completely empty, the old behaviour is still followed. But if there are
jobs already, slot allocation becomes erratic. SGE seems to prefer
filling up already used machines before re-using free machines.

Is this intentional, maybe even a feature?
Can the old behaviour be reinstated?
Is there some other explanation?

A simple example: We start with an empty cluster of eight four-CPU
machines and submit nine 3-slot fill_up jobs:

scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 135 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 136 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 137 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 138 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 139 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 140 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 141 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 142 ("STDIN") has been submitted
scmic at l5-auto-du ~ $ echo sleep 100 | qsub -pe dmp 3
Your job 143 ("STDIN") has been submitted

After that job distribution looks as follows:

scmic at l5-auto-du ~ $ qhost -j
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
l5-auto-du              -               -     -       -       -       -       -
l5-node01               lx26-amd64      4  0.02    7.9G  370.0M    9.8G     0.0
   job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID
   ----------------------------------------------------------------------------------------------
       141 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
l5-node02               lx26-amd64      4  0.69    7.9G  339.4M    9.8G     0.0
       139 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
l5-node03               lx26-amd64      4  0.03    7.9G  374.5M    9.8G     0.0
       135 0.55500 STDIN      scmic        r     07/20/2010 15:53:01 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm MASTER
l5-node04               lx26-amd64      4  4.92    7.9G  955.4M    9.8G     0.0
       137 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
l5-node05               lx26-amd64      4  5.11    7.9G  951.3M    9.8G     0.0
       140 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
l5-node06               lx26-amd64      4  0.03    7.9G  370.1M    9.8G     0.0
       136 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
l5-node07               lx26-amd64      4  0.06    7.9G  381.3M    9.8G     0.0
       138 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
l5-node08               lx26-amd64      4  0.02    7.9G  347.8M    9.8G     0.0
       142 0.55500 STDIN      scmic        r     07/20/2010 15:53:04 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE

Then I free up node 6 by deleting job 136 and submit a new job:

scmic at l5-auto-du ~ $ qdel --version 6.2u5 136
scmic has registered the job 136 for deletion
scmic at l5-auto-du ~ $ echo sleep 100 | qsub --version 6.2u5 -pe dmp 3
Your job 144 ("STDIN") has been submitted

I'd expect job 144 to be run on node 6. Instead it ends up distributed over nodes 5, 1 and 2:

scmic at l5-auto-du ~ $ qhost -j --version 6.2u5
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
l5-auto-du              -               -     -       -       -       -       -
l5-node01               lx26-amd64      4  0.02    7.9G  370.0M    9.8G     0.0
   job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID
   ----------------------------------------------------------------------------------------------
       141 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       144 0.55500 STDIN      scmic        r     07/20/2010 15:53:26 express-dm SLAVE
l5-node02               lx26-amd64      4  0.69    7.9G  339.4M    9.8G     0.0
       139 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       144 0.55500 STDIN      scmic        r     07/20/2010 15:53:26 express-dm MASTER
l5-node03               lx26-amd64      4  0.03    7.9G  374.5M    9.8G     0.0
       135 0.55500 STDIN      scmic        r     07/20/2010 15:53:01 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm MASTER
l5-node04               lx26-amd64      4  4.92    7.9G  955.4M    9.8G     0.0
       137 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
l5-node05               lx26-amd64      4  5.11    7.9G  951.3M    9.8G     0.0
       140 0.55500 STDIN      scmic        r     07/20/2010 15:53:03 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       144 0.55500 STDIN      scmic        r     07/20/2010 15:53:26 express-dm SLAVE
l5-node06               lx26-amd64      4  0.03    7.9G  370.1M    9.8G     0.0
l5-node07               lx26-amd64      4  0.06    7.9G  381.3M    9.8G     0.0
       138 0.55500 STDIN      scmic        r     07/20/2010 15:53:02 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
       143 0.55500 STDIN      scmic        r     07/20/2010 15:53:08 express-dm SLAVE
l5-node08               lx26-amd64      4  0.02    7.9G  347.8M    9.8G     0.0
       142 0.55500 STDIN      scmic        r     07/20/2010 15:53:04 express-dm MASTER
                                                                     express-dm SLAVE
                                                                     express-dm SLAVE
Any help would be greatly appreciated.

Thanks in advance,
-- 
Michael Weiser                science + computing ag
Senior Systems Engineer       Geschaeftsstelle Duesseldorf
                              Martinstrasse 47-55, Haus A
phone: +49 211 302 708 32     D-40223 Duesseldorf
fax:   +49 211 302 708 50     www.science-computing.de
-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Michel Lepert
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=269271

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list