[GE users] Scheduling of long jobs

Patrice Seyed apseyed at bu.edu
Tue Aug 24 16:22:54 BST 2004

That being said, in general subordinate queues are great for short vs. long
jobs, and suspending the long jobs for short jobs to run then get out.

But what is the better strategy when you have jobs that run for
weeks/months, and its many of the users submitting these type of jobs?
(better than manually changing the maxjobs per user parameter relative
available queues at a given time)


-----Original Message-----
From: Patrice Seyed [mailto:apseyed at bu.edu] 
Sent: Thursday, August 19, 2004 7:17 PM
To: 'users at gridengine.sunsource.net'
Subject: RE: [GE users] Scheduling of long jobs

Interesting Reuti, but like you said a job in long00b stay there even if a
long00b is open. I'm not sure if this method is "smarter" than mine, but
with your method you don't have a possible scenerio where 3 jobs are running
over 2 cpus, but even though that can occur on mine, it can for no more than
2 hours (hard limit on express queues), unless there is are jobs waiting to
get into an express queue. 

I agree it would be nice to be able to suspend a slot instead of a queue.
The current setup is more attuned for single cpu jobs, and also for my
cluster making a queue for each single cpu doesn't seem feasible.


-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Thursday, August 19, 2004 4:47 PM
To: users at gridengine.sunsource.net
Subject: [GE users] Scheduling of long jobs

Hi Patrice,

>I am using SGEE 5.3, and what I have done so far is deployed the concept of
>"express" queues, where jobs submitted to these queues, when two or more
>jobs, suspend jobs in the regular queues until there is 1 or less in the
>express queue, but have a 2 hour limits in the express queue. Since my
>machines are dual cpued I could not do hierachial queues, in terms of
>walltime, unless I made a queue for each job slot. Also I am aware of the
>max job per user limit, this can help but since it is a hard limit it also
>can restrict when queues are open.

yes, this is also they way I solved it. But you will need only three queues
node, and adjust them in a way that unnecessary suspends are avoided:

$ qconf -sq long00a
qname                long00a
hostname             node00
seq_no               11
slots                1

$ qconf -sq long00b
qname                long00b
hostname             node00
seq_no               21
slots                1

$ qconf -sq short00 
qname                short00
hostname             node00
seq_no               51
slots                2
subordinate_list     long00a=2, long00b=1

When you also set "queue_sort_method seqno", the long jobs will go to
first, but in case of a short job the long..b will be suspended first. Yes, 
it's not perfect, because no job will change from queue long..b to long..a, 
when the job in long..a finish.

On the other, also with four queues per host, you would have the same load
on a 
machine, whether there are 2 long or (1 long + 1 short) job running, and the

scheduler will select one machine for you for your new short job. 

Maybe it would be an enhancement to SGE, if you could specify not to suspend

the whole subordinated queue, but only so many slots, as slots in the 
superordinated queue are taken. The next enhancement would be to make a
robin over all the used slots in the subordinated queue, so that they share
remaining slot over time, e.g. to switch between the running jobs there
every 5 
minutes. Do you think it's worth to be entered in Issuezilla?

Cheers - Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list