[GE users] Reserving a queue for memory usage a <= b

Reuti reuti at staff.uni-marburg.de
Tue Sep 23 12:21:14 BST 2008


Am 23.09.2008 um 12:30 schrieb Atle Rudshaug:

> Reuti wrote:
>> Hi Aaron,
>>
>> Am 23.09.2008 um 11:24 schrieb Aaron Turner:
>>
>>> Hello SGE users,
>>>
>>> What I am looking to achieve is reserving some machines for  
>>> higher memory jobs.
>>>
>>> The idea would be for:
>>> 1. Jobs capable of being scheduled on the smaller memory machines  
>>> (memory usage <= a) going to those machines.
>>
>> you can set up a consumable complex (better: you can make  
>> virtual_free or h_vmem consumable) and set a sensible value for  
>> each node in the exechost definition. This complex you have to  
>> request when you submit the job.
>>
>>> 2. Jobs of greater memory usage than a but less than b going to  
>>> the high memory machines.
>>> 3. Very high memory jobs we can't accomodate get rejected and the  
>>> user alerted.
>>
>> See below.
>>
>>> 4. The high memory machines being kept as busy as possible.
>>
>> You can sort queue instances by setting a sequence number and set  
>> the scheduler to sort by seqno:
>>
>> seq_no 0,[@big_machines=10],[@small_machines=20]
>>
>>> 5. Any user with a job of memory usage greater than a and less  
>>> than b having the minimum wait possible before their job starts  
>>> running.
>>>
>>> In an ideal world all jobs would be checkpointable and submitted  
>>> as such so I could simply reduce the time slice down to a  
>>> shortish time and simply get the jobs rescheduled for additional  
>>> processing. This would also make the fair sharing a bit less  
>>> clumpy. But I am still not convinced the checkpointing issue is  
>>> fully solved for arbitrary code.
>>
>> There is no "issue" with SGE regarding checkpointing, it is simply  
>> not designed to do it on its own. SGE will support checkpointing  
>> if it's built into the application or provided by any 3rd party  
>> library. It's not the intention of SGE to offer checkpointing  
>> facitilies.
>>
>>> So given this what is the best way to approach it? I did try  
>>> setting up a series of subordinations for the queues with a  
>>> shorter queue to absorb excess  jobs but not lock up the high  
>>> memory machine for a long time period with them but it doesn't  
>>> seem to operate quite as I would have hoped. Is there a better  
>>> way of approaching this, such as adding a complex to do this? The  
>>> base complex relationships offer >= and <= but not a <= b <= c!
>>
>> You can submit jobs with:
>>
>> -w e
>>
>> and you will get an error message if there aren't any queues/hosts  
>> at all to satisfy the request.
>>
>> -- Reuti
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>
> Hi!
>
> I would like to do the same (reserve different machines according  
> to job memory needs), only through DRMAA. Especially the -w e  
> option with user feedback if no applicable hardware is available.  
> Is this possible through DRMAA?

Isn't -w e the default for DRMAA - do you observe something  
different? You should get an error code back when you try to submit.

http://gridengine.sunsource.net/servlets/ReadMsg? 
listName=users&msgNo=24925

-- Reuti

> - Atle
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list