[GE users] Fwd: Subordinate queue help

futurity neil at futurity.co.uk
Thu Apr 29 15:30:21 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Richard,

Thank you for your suggestion.  It certainly keeps everything very simple.  We already use host groups and assign them into different queues to make the optimal CPU core selection easier.

The other reason for choosing to use 12 "low" queues is that the associated 12 "high" queues can suspend just a single job rather than all the jobs on the same machine.

Although I'm experiencing teething problems with this method (suspended resources don't seem to be freed up) it does seem to work.  Therefore on a busy grid with only a handful of high priority jobs being submitted, all the core that the high priority jobs are not running on are free to continue running "low" priority jobs, making maximum use of the grid CPU cores.

I'm not aware of a way to implement this with a single "low" queue and a single "high" queue.

I hope my explanation makes sense.

Neil

On 29 April 2010 15:12, rems0 <Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>> wrote:
Hi Neil,

see my comment below.

On 04/29/2010 02:39 PM, futurity wrote:
> Hi Daniel,
>
> We specify complexes because the low queue is made from many smaller
> queues (L1 up to L12).  As many different machine types are in the low
> queue, we can't put them all into one queue and specify a common number
> of slots as some have 2 cores, others 4 or 8 cores.

You can! At least we do.

We have only one queue defined, but 3 hostgroups for 3 different type of
node hardware. For any of the entries in the queue definition you can
define a default global value for all instances, and particular values
for single instances or groups:

hostlist              @c3 @c4 @c5 c5n24
seq_no                4,[@c3=3],[@c4=2],[@c5=1]
pe_list               NONE,[@c3=c3_pe],[@c4=c4_pe],[@c5=c5_pe]
slots                 4,[@c5=8],[c5n24=16]
xuser_lists           NONE,[@c3=c3_excluded_user_list]

The slots line is the one you could use to set your available slots for
each of your hostgroups! And therefore keep it simple in less queues
with many instances. You can disable/enable/suspend per instance, etc.

Richard


>
> Also, by specifying a sub queue per core, we can transfer jobs to these
> queues in the order that we want jobs to run on the cores.  i.e. we can
> use up the first core of all machines before moving onto the second
> core, etc.  This works very well and enables jobs to run on the fastest
> possible CPU available.
>
> Earlier this morning I found the solution to our problem.  We use a
> consumable resource "cores" within the grid as some jobs wish to reserve
> all the cores on a single machine for a single job.  Running 2 low jobs
> on a single 2 core machine uses up this "cores" resource and therefore
> high jobs won't be transferred to it as they require a core but none are
> available.  As the high job isn't transferred, it doesn't suspend the
> running low jobs.
>
> Thank you for your replay,
>
> Neil
>
> On 29 April 2010 10:04, dagru <d.gruber at sun.com<mailto:d.gruber at sun.com>
> <mailto:d.gruber at sun.com<mailto:d.gruber at sun.com>>> wrote:
>
>     The point you miss is that the high priority job does not
>     find free resources in order to be scheduled. Suspended
>     jobs also does not free resources (consumables).
>     Why wouldn't it be enough when you request your queues
>     (both high priority queues for high priority jobs) instead
>     of the complexes?
>
>     Daniel
>
>     On 04/28/10 11:10, futurity2 wrote:
>>>     Hi,
>>>
>>>     I was wondering if someone could point me at some infomation on
>>>     subordinate queues as I'm experiencing strange behaviour.
>>>
>>>     We're using grid engine 62u5
>>>
>>>     I like to have a queue for each CPU core in our execution hosts for
>>>     greater control of which cores are used first.
>>>
>>>     For this example let's say I'm using a single exec host which is a 2
>>>     core machine, so I have 2 queues (L1& L2). Each queue has one slot.
>>>     Each has a complex "qp" set to a value of "low". When I submit jobs
>>>     to the grid with "qsub -l qp=low job.sh" everything rubs as expected
>>>     on both these 2 queues (2 jobs at a time).
>>>
>>>     I then create two new queues (H1 & H2) with complex "qp" = "high"
>>>     and made L1 a subordinate of H1 and L2 a subordinate of H2.
>>>
>>>     Now if I submit 10x qp=low jobs, two jobs transfer and run on the
>>>     single execution machine. If I submit 10x qp=high jobs, none of them
>>>     seem to transfer until a slot is free which I wasn't expecting. When
>>>     a slot becomes free it transfers the high job to the non free slot
>>>     suspending the low job, but the free slot is still left unused even
>>>     though more high jobs remain queued.
>>>
>>>     Any advice or URL for further reading would be greatfully received.
>>>
>>>     Kind Regards
>>>
>>>     Neil
>>>
>>>     This message was tapped on a iPhone.
>>>
>>     ------------------------------------------------------
>>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255258 <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255258>
>>
>>     To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net> <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>].
>>
>
>


--
Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255444

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________




More information about the gridengine-users mailing list