[GE users] Allocation rule behavior

jcd jcducom at gmail.com
Thu Aug 20 20:39:34 BST 2009


All-
To reduce job 'segmentation' over a large number of nodes with free 
slots, I setup the allocation rule of the PE ompi to match the number of 
  slots available on each node. (For a parallel job using 4cores, using 
the fill_up or round_robin method will put processes on any available 
slots like for instance 1process on host1, 1process on host2, 2processes 
on host3).

The configuration of a PE:
# qconf -sp ompitest
pe_name            ompitest
slots              302
user_lists         crc
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    4
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE


The job submission script is:
#!/bin/csh
#$ -pe ompitest 2
#$ -q *@@nehalem
module load ompi/1.3.2-intel
mpirun  -np $NSLOTS hostname


The previous job waits for ever in the queue:
$ qstat -u jducom
job-ID  prior   name       user         state submit/start at     queue 
                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   23326 0.51167 openmpi.sh jducom       qw    08/20/2009 15:12:57 
                               2

The reason invoked by the scheduler is the following:
$ qstat -j 23327
cannot run in PE "ompitest" because it only offers 0 slots

which is quite surprising as the PE as plenty of slots available.




As soon as the allocation rule is changed to $fill_up, the job is 
scheduled immediately as expected:
$ qstat -u jducom
job-ID  prior   name       user         state submit/start at     queue 
                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   23326 0.51167 openmpi.sh jducom       t     08/20/2009 15:14:39 
long at dqcneh012.crc.nd.edu          2


Bottom line: as long as the job requests a number of NSLOTS which is a 
MULTIPLE of the number of slots specified in the allocation rule, the 
job goes thru. If it is not a multiple, it will wait in the queue.

I was wondering if it is the expected behavior.
Thank you

JC

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213340

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list