[GE users] auto rerun lower priority jobs when higher are waiting?

reuti reuti at staff.uni-marburg.de
Tue Aug 25 11:58:46 BST 2009


Am 25.08.2009 um 08:54 schrieb jesperkrogh:

> This all seems to work flawlessly except..
>
>> Am 23.08.2009 um 14:15 schrieb jesperkrogh:
>>
>>> Can I instruct gridengine to automatically rerun lower-priority
>>> jobs if
>>> higher are waiting in the queue. (if they are marked rerunnable)?
>>>
>>> The majority of our computations are doing checkpointing and are in
>>> fact
>>> rerunnable, but sometime a user really just wants to get a bunch of
>>> jobs
>>> on so they send it with a higher priority. But they still have to  
>>> wait
>>> for the lower priority jobs to leave the nodes.
>>>
>>> It would be nice if gridengine just notiched that the running jobs
>>> ideed
>>> are rerunnable, so it just pulls them off and launches the higher
>>> priority stuff.
>>
>> you will have to setup a checkpointing environment, which checkpoints
>> the job when the queue gets suspended and rerun it.
>>
>> a) the high-priority jobs will need a dedicated queue (and your
>> configuration must allow the jobs to start, altough resources are
>> already occupied by the low-priority jobs)
>
> The system doesn't schedule stuff onto the "highpriority" slots  
> (all.q)
> when running jobs in the subordinate slots of the same queue.
>
> http://krogh.cc/~jesper/all.q.txt
> http://krogh.cc/~jesper/rerunnable.q.txt
>
> Can you see the misconfiguration?

Did you restrict somewhere the number of total slots to 32 on this  
machine or per machine?

-- Reuti


>> b) the queue for low-priority jobs must be subordinated to the queue
>> for high-priority jobs
>
> Done.
>
>> c) a checkpointing environment (for the low-priority jobs) which will
>> abort the job on suspend, and attached to the queue for low-priority
>> jobs
>
> Works.. (qmod -s <jobid> flushes the job back to the qw position).
>
>> d) low-priority jobs must request this checkpointing environment,
>> maybe with a JSV for easy handling by the user
>>
>> There is a Howto for the checkpointing operation:
>>
>> http://gridengine.sunsource.net/howto/checkpointing.html
>
> This tutorial was excellent.
> -- 
> Jesper
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=214120
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214165

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list