[GE users] Running 1, 2, 3, or 4 jobs on a host with 4 slots

Marconnet, James E Mr /Computer Sciences Corporation james.marconnet at smdc.army.mil
Wed Mar 16 23:26:49 GMT 2005


Thanks Reuti! This is making more sense to me, slowly as I consider your
suggestion and read and re-read the documentation.  

I had not caught from my earlier read of the documentation that this
subordination of ques worked on a node by node basis. I was thinking that it
subordinated an entire que at a time (for all node instances), not a single
que instance. 

But I'm still struggling a little on the exact number to put in to trigger
subordination.

You suggested: max2=1,max3=1,max4=1 (see below)

As I understand it, with the number set to 1, submitting 1 job to a que
allowing 3 slots per node would then immediately subordinate all the other
ques specified individually after just one (1) instance (run) starts on this
node. So only this particular que, once it was used, could ever fill that
particular node up to 3 total slots used.

Then it seems that if just 1 job finished out of 1, 2, or 3 jobs running on
that node, that all the ques would be un-subordinated, leaving things wide
open again? Seemingly then with just 1 or 2 jobs running, some jobs
submitted to a que that allowed 50 slots could quickly oversubscribe this
node way past the 3 jobs/node that these particular jobs were expecting to
experience.

Another possibility, could the number be set to 3 to let other ques consider
filling in the rest of the 3 slots, one at a time, ....... or would that let
them go ahead and fill it beyond 3 slots because this particular
subordination "rule/setting" would not ever be re-considered when assigning
the next job from another que? I'm thinking that the only safe number to use
is 1, as you suggested.

If these numbers were left blank, then the total slots would have to be used
up (3 in this case, by this particular que - or perhaps more if partially
filled by this que and then filled more by another less-restrictive que)
before the subordinated que would be suspended. Would that make sense in my
case? I don't think so. In what sort of instance would leaving it blank ever
make sense?

Getting closer to understanding this. And hope this helps others understand
it more easily than it has been for me.
Jim

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Monday, March 14, 2005 5:23 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Running 1, 2, 3, or 4 jobs on a host with 4 slots

Hi,

my idea was really to suspend all other three queues on the same machine in
all cases. In your setup, you may have already 3 jobs running which allow "4
on the machine" and another job for exclusive use come in - your three other
jobs in
max4 will be suspended (but the memory not released, nor disk freed). 
Suspending the one-job queue before would avoid it to happen, as no job can
get in.

Subordinated queues are only affected on the same host - on a node by node
basis you are looking for, although they are defined in the cluster queue
(man queue_conf). You don't have to fear, that the whole cluster is blocked.
I was only thinking of: if you have a huge bunch for max4 jobs, an exclusive
1-node job will wait a while, because always another job may slip in.

If the restriction of having only one job on a node is memory and/or scratch
disk space, it would be better to request this resource, because this would
better fit into the possible resource reservation in SGE and will prevent
endless waiting jobs.

If you encounter problems with three jobs on a node - this shouldn't be! My
thoughts were only about performance.

CU - Reuti

> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de
> <mailto:reuti at staff.uni-marburg.de>
> <mailto:reuti at staff.uni-marburg.de <mailto:reuti at staff.uni-marburg.de> 
> > ]
> 
> <snip>
> 
> So, define 4 queues with the different settings in each of them:
> 
> max1:
> slots 1
> subordinate_list max2=1,max3=1,max4=1
> 
> max2:
> slots 2
> subordinate_list max1=1,max3=1,max4=1
> 
> max3:
> slots 3
> subordinate_list max1=1,max2=1,max4=1
> 
> max4:
> slots 4
> subordinate_list max1=1,max2=1,max3=1

<snip>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list