[GE users] Allocation rule behavior

jcd jcducom at gmail.com
Fri Aug 21 03:45:58 BST 2009


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti-
We use the ompi* solution indeed. I didn't play with JSV yet. It looks like a very nice alternative.
Thank you very much for your always prompt and valuable help
JC



On Thu, Aug 20, 2009 at 4:50 PM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
Am 20.08.2009 um 22:26 schrieb jcd:

> Reuti and Dan-
> Thanks for the feedback. Indeed the goal is to make ompi-2way,
> ompi-4way, etc... but I wanted to keep the number of PE minimal i.e.
> less confusing for users
> Thanks again

When you have 6.2u3 you could use the JSV (job submission verifier)
to attach the correct PE for the user. I mean:

- define "ompi" as a PE but don't attach it to any queue: this one
the user will request
- use the JSV to replace the PE request with the one which fits to
the requested slot count:
e.g. "ompi2", "ompi4", ...

Another option could be to request "-pe ompi* 4", and let SGE select
one of the available PEs.

-- Reuti


> JC
>
> reuti wrote:
>> Am 20.08.2009 um 21:39 schrieb jcd:
>>
>>> All-
>>> To reduce job 'segmentation' over a large number of nodes with free
>>> slots, I setup the allocation rule of the PE ompi to match the
>>> number of
>>>   slots available on each node. (For a parallel job using 4cores,
>>> using
>>> the fill_up or round_robin method will put processes on any
>>> available
>>> slots like for instance 1process on host1, 1process on host2,
>>> 2processes
>>> on host3).
>>>
>>> The configuration of a PE:
>>> # qconf -sp ompitest
>>> pe_name            ompitest
>>> slots              302
>>> user_lists         crc
>>> xuser_lists        NONE
>>> start_proc_args    /bin/true
>>> stop_proc_args     /bin/true
>>> allocation_rule    4
>>> control_slaves     TRUE
>>> job_is_first_task  FALSE
>>> urgency_slots      min
>>> accounting_summary FALSE
>>>
>>>
>>> The job submission script is:
>>> #!/bin/csh
>>> #$ -pe ompitest 2
>>> #$ -q *@@nehalem
>>> module load ompi/1.3.2-intel
>>> mpirun  -np $NSLOTS hostname
>>>
>>>
>>> The previous job waits for ever in the queue:
>>> $ qstat -u jducom
>>> job-ID  prior   name       user         state submit/start at
>>> queue
>>>                           slots ja-task-ID
>>> --------------------------------------------------------------------
>>> --
>>> -------------------------------------------
>>>    23326 0.51167 openmpi.sh jducom       qw    08/20/2009 15:12:57
>>>                                2
>>>
>>> The reason invoked by the scheduler is the following:
>>> $ qstat -j 23327
>>> cannot run in PE "ompitest" because it only offers 0 slots
>>>
>>> which is quite surprising as the PE as plenty of slots available.
>>>
>>>
>>>
>>>
>>> As soon as the allocation rule is changed to $fill_up, the job is
>>> scheduled immediately as expected:
>>> $ qstat -u jducom
>>> job-ID  prior   name       user         state submit/start at
>>> queue
>>>                           slots ja-task-ID
>>> --------------------------------------------------------------------
>>> --
>>> -------------------------------------------
>>>    23326 0.51167 openmpi.sh jducom       t     08/20/2009 15:14:39
>>> long at dqcneh012.crc.nd.edu<mailto:long at dqcneh012.crc.nd.edu>          2
>>>
>>>
>>> Bottom line: as long as the job requests a number of NSLOTS which
>>> is a
>>> MULTIPLE of the number of slots specified in the allocation rule,
>>> the
>>> job goes thru. If it is not a multiple, it will wait in the queue.
>>>
>>> I was wondering if it is the expected behavior.
>>
>> Yes. Maybe you need a second PE with allocation rule 2.
>>
>> -- Reuti
>>
>>> Thank you
>>>
>>> JC
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=213340
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=213344
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=213348
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213354

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list