[GE users] Resource Reservation Issue

Bradford, Matthew matthew.bradford at EDS.COM
Tue Sep 16 14:01:26 BST 2008


 

>-----Original Message-----
>From: Reuti [mailto:reuti at staff.uni-marburg.de] 
>Sent: 16 September 2008 12:53
>To: users at gridengine.sunsource.net
>Subject: Re: [GE users] Resource Reservation Issue
>
>Hi Mat,
>
>Am 14.09.2008 um 12:21 schrieb Bradford, Matthew:
>
>> Reuti,
>>
>> No, we used qalter once the jobs were running, we manually reduced 
>> there h_rt time to see whether this changes  which nodes the 
>scheduler 
>> thinks will become available first, and therefore use these in the 
>> reservation for the reserving job.
>>
>> I think my understanding of the use of h_rt was wrong. If I now 
>> understand correctly, it is part of the request stating that the job 
>> requires this much run time rather than a attribute of the 
>job itself.
>> The scheduler will then use the h_rt value to select the appropriate 
>> queue. Once the job has started, I assume that altering this 
>value has 
>> no effect.
>
>exactly. This is also set of an ulimit for the process, hence 
>it can't be changed later on.
>
>> Maybe that isn't the best way of testing this. We have tried 
>a similar 
>> test, with a cluster full of jobs, and a pending job requesting 
>> resource reservation of 4 slots. Even killing some of the 
>running jobs 
>> in the cluster to free up 4 slots, the scheduler still doesn't alter 
>> the reservation to use them, and if jobs lower down in the pending 
>> queue are present, they can jump in and start using the newly 
>> available slots.
>>
>> Basically, we have users complaining that their jobs, which are 
>> sitting at the top of the pending queue, with reservations set, are 
>> not the next jobs to execute.
>
>Jobs with reservation are not by default considered to be the 
>most important ones - they are just in the queue like other 
>jobs. When they are the next ones to be executed, the 
>reservation should start.  
>You might want to add a complex with a set urgency for this 
>type of jobs to push them up in the waiting list.

We don't have an issue with the priority of these jobs, as the
administrator pushes them to the top of the pending list using override
tickets as agreed by a review board. The issue is that we need those
jobs to be the next ones to execute, and they aren't.

Cheers,

Mat
>
>-- Reuti
>
>
>> Any thoughts would be most helpful.
>>
>> Cheers,
>>
>> Mat
>>
>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: 12 September 2008 23:02
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Resource Reservation Issue
>>>
>>> Hi,
>>>
>>> Am 11.09.2008 um 13:21 schrieb Bradford, Matthew:
>>>
>>>> We are running SGE 6.0u8 and have a problem with how resource 
>>>> reservation works.
>>>>
>>>> For example:
>>>>
>>>> We have a full cluster running mainly parallel jobs of 
>various sizes 
>>>> from 1 node to 16 nodes.
>>>> We allow only 2 jobs to be run with a Reserve flag on them.
>>>> Users aren't specifying an h_rt, and the default runtime is
>>> set to 480
>>>> hours.
>>>> Occasionally, we set important jobs with a reserve flag to prevent 
>>>> resource starvation, and we push the job to the top of the pending 
>>>> queue.
>>>>
>>>> What we appear to get when we switch on monitoring is the 
>scheduler 
>>>> selecting the nodes for the reserved job, but then not 
>amending that 
>>>> selection even when there are free nodes in the system.
>>>>
>>>> During testing of this we noticed no difference when we
>>> explicitly set
>>>> the h_rt to 480 hours for all jobs, and then reduce the h_rt for 
>>>> specific jobs. We thought that the scheduler would recalculate and 
>>>> select nodes where it knows that jobs are nearly finished.
>>>>
>>> what do you mean by "reducing" - the waiting jobs? With 
>qalter while 
>>> they are waiting?
>>>
>>> -- Reuti
>>>
>>> 
>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list