[GE users] Reserving a queue for memory usage a <= b

Reuti reuti at staff.uni-marburg.de
Tue Sep 23 10:39:46 BST 2008

Hi Aaron,

Am 23.09.2008 um 11:24 schrieb Aaron Turner:

> Hello SGE users,
> What I am looking to achieve is reserving some machines for higher  
> memory jobs.
> The idea would be for:
> 1. Jobs capable of being scheduled on the smaller memory machines  
> (memory usage <= a) going to those machines.

you can set up a consumable complex (better: you can make  
virtual_free or h_vmem consumable) and set a sensible value for each  
node in the exechost definition. This complex you have to request  
when you submit the job.

> 2. Jobs of greater memory usage than a but less than b going to the  
> high memory machines.
> 3. Very high memory jobs we can't accomodate get rejected and the  
> user alerted.

See below.

> 4. The high memory machines being kept as busy as possible.

You can sort queue instances by setting a sequence number and set the  
scheduler to sort by seqno:

seq_no 0,[@big_machines=10],[@small_machines=20]

> 5. Any user with a job of memory usage greater than a and less than  
> b having the minimum wait possible before their job starts running.
> In an ideal world all jobs would be checkpointable and submitted as  
> such so I could simply reduce the time slice down to a shortish  
> time and simply get the jobs rescheduled for additional processing.  
> This would also make the fair sharing a bit less clumpy. But I am  
> still not convinced the checkpointing issue is fully solved for  
> arbitrary code.

There is no "issue" with SGE regarding checkpointing, it is simply  
not designed to do it on its own. SGE will support checkpointing if  
it's built into the application or provided by any 3rd party library.  
It's not the intention of SGE to offer checkpointing facitilies.

> So given this what is the best way to approach it? I did try  
> setting up a series of subordinations for the queues with a shorter  
> queue to absorb excess  jobs but not lock up the high memory  
> machine for a long time period with them but it doesn't seem to  
> operate quite as I would have hoped. Is there a better way of  
> approaching this, such as adding a complex to do this? The base  
> complex relationships offer >= and <= but not a <= b <= c!

You can submit jobs with:

-w e

and you will get an error message if there aren't any queues/hosts at  
all to satisfy the request.

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list