[GE users] Scheduling of long jobs

Reuti reuti at staff.uni-marburg.de
Thu Aug 19 21:47:05 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Patrice,

>I am using SGEE 5.3, and what I have done so far is deployed the concept of
>"express" queues, where jobs submitted to these queues, when two or more
>jobs, suspend jobs in the regular queues until there is 1 or less in the
>express queue, but have a 2 hour limits in the express queue. Since my
>machines are dual cpued I could not do hierachial queues, in terms of
>walltime, unless I made a queue for each job slot. Also I am aware of the
>max job per user limit, this can help but since it is a hard limit it also
>can restrict when queues are open.

yes, this is also they way I solved it. But you will need only three queues per 
node, and adjust them in a way that unnecessary suspends are avoided:

$ qconf -sq long00a
qname                long00a
hostname             node00
seq_no               11
..
slots                1
..

$ qconf -sq long00b
qname                long00b
hostname             node00
seq_no               21
..
slots                1
..

$ qconf -sq short00 
qname                short00
hostname             node00
seq_no               51
..
slots                2
..
subordinate_list     long00a=2, long00b=1
..

When you also set "queue_sort_method seqno", the long jobs will go to long..a 
first, but in case of a short job the long..b will be suspended first. Yes, 
it's not perfect, because no job will change from queue long..b to long..a, 
when the job in long..a finish.

On the other, also with four queues per host, you would have the same load on a 
machine, whether there are 2 long or (1 long + 1 short) job running, and the 
scheduler will select one machine for you for your new short job. 

Maybe it would be an enhancement to SGE, if you could specify not to suspend 
the whole subordinated queue, but only so many slots, as slots in the 
superordinated queue are taken. The next enhancement would be to make a round 
robin over all the used slots in the subordinated queue, so that they share the 
remaining slot over time, e.g. to switch between the running jobs there every 5 
minutes. Do you think it's worth to be entered in Issuezilla?

Cheers - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list