[GE users] Resource Reservation Issue

Bradford, Matthew matthew.bradford at eds.com
Sun Sep 14 11:21:40 BST 2008


Reuti,

No, we used qalter once the jobs were running, we manually reduced there
h_rt time to see whether this changes  which nodes the scheduler thinks
will become available first, and therefore use these in the reservation
for the reserving job.

I think my understanding of the use of h_rt was wrong. If I now
understand correctly, it is part of the request stating that the job
requires this much run time rather than a attribute of the job itself.
The scheduler will then use the h_rt value to select the appropriate
queue. Once the job has started, I assume that altering this value has
no effect.


Maybe that isn't the best way of testing this. We have tried a similar
test, with a cluster full of jobs, and a pending job requesting resource
reservation of 4 slots. Even killing some of the running jobs in the
cluster to free up 4 slots, the scheduler still doesn't alter the
reservation to use them, and if jobs lower down in the pending queue are
present, they can jump in and start using the newly available slots.

Basically, we have users complaining that their jobs, which are sitting
at the top of the pending queue, with reservations set, are not the next
jobs to execute.

Any thoughts would be most helpful.

Cheers,

Mat


>-----Original Message-----
>From: Reuti [mailto:reuti at staff.uni-marburg.de] 
>Sent: 12 September 2008 23:02
>To: users at gridengine.sunsource.net
>Subject: Re: [GE users] Resource Reservation Issue
>
>Hi,
>
>Am 11.09.2008 um 13:21 schrieb Bradford, Matthew:
>
>> We are running SGE 6.0u8 and have a problem with how resource 
>> reservation works.
>>
>> For example:
>>
>> We have a full cluster running mainly parallel jobs of various sizes 
>> from 1 node to 16 nodes.
>> We allow only 2 jobs to be run with a Reserve flag on them.
>> Users aren't specifying an h_rt, and the default runtime is 
>set to 480 
>> hours.
>> Occasionally, we set important jobs with a reserve flag to prevent 
>> resource starvation, and we push the job to the top of the pending 
>> queue.
>>
>> What we appear to get when we switch on monitoring is the scheduler 
>> selecting the nodes for the reserved job, but then not amending that 
>> selection even when there are free nodes in the system.
>>
>> During testing of this we noticed no difference when we 
>explicitly set 
>> the h_rt to 480 hours for all jobs, and then reduce the h_rt for 
>> specific jobs. We thought that the scheduler would recalculate and 
>> select nodes where it knows that jobs are nearly finished.
>>
>what do you mean by "reducing" - the waiting jobs? With qalter 
>while they are waiting?
>
>-- Reuti
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list