[GE users] Allocation rule behavior

templedf dan.templeton at sun.com
Thu Aug 20 20:46:55 BST 2009


Yes, that is actually expected behavior.  Your PE requires that you put 
four slaves on each node, and you don't have four slaves in your job, so 
the scheduler doesn't know what to do with it.  Whether or not that 
*should be* expected behavior is a different question.

Daniel

jcd wrote:
> All-
> To reduce job 'segmentation' over a large number of nodes with free 
> slots, I setup the allocation rule of the PE ompi to match the number of 
>   slots available on each node. (For a parallel job using 4cores, using 
> the fill_up or round_robin method will put processes on any available 
> slots like for instance 1process on host1, 1process on host2, 2processes 
> on host3).
>
> The configuration of a PE:
> # qconf -sp ompitest
> pe_name            ompitest
> slots              302
> user_lists         crc
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    4
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
>
>
> The job submission script is:
> #!/bin/csh
> #$ -pe ompitest 2
> #$ -q *@@nehalem
> module load ompi/1.3.2-intel
> mpirun  -np $NSLOTS hostname
>
>
> The previous job waits for ever in the queue:
> $ qstat -u jducom
> job-ID  prior   name       user         state submit/start at     queue 
>                           slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>    23326 0.51167 openmpi.sh jducom       qw    08/20/2009 15:12:57 
>                                2
>
> The reason invoked by the scheduler is the following:
> $ qstat -j 23327
> cannot run in PE "ompitest" because it only offers 0 slots
>
> which is quite surprising as the PE as plenty of slots available.
>
>
>
>
> As soon as the allocation rule is changed to $fill_up, the job is 
> scheduled immediately as expected:
> $ qstat -u jducom
> job-ID  prior   name       user         state submit/start at     queue 
>                           slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>    23326 0.51167 openmpi.sh jducom       t     08/20/2009 15:14:39 
> long at dqcneh012.crc.nd.edu          2
>
>
> Bottom line: as long as the job requests a number of NSLOTS which is a 
> MULTIPLE of the number of slots specified in the allocation rule, the 
> job goes thru. If it is not a multiple, it will wait in the queue.
>
> I was wondering if it is the expected behavior.
> Thank you
>
> JC
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213340
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213345

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "dan.templeton.vcf"  Text/X-VCARD (Name: ]
    [ "dan.templeton.vcf") ~145 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list