[GE users] auto rerun lower priority jobs when higher are waiting?

jesperkrogh jesper at krogh.cc
Tue Aug 25 07:54:19 BST 2009


This all seems to work flawlessly except..

> Am 23.08.2009 um 14:15 schrieb jesperkrogh:
>
>> Can I instruct gridengine to automatically rerun lower-priority
>> jobs if
>> higher are waiting in the queue. (if they are marked rerunnable)?
>>
>> The majority of our computations are doing checkpointing and are in
>> fact
>> rerunnable, but sometime a user really just wants to get a bunch of
>> jobs
>> on so they send it with a higher priority. But they still have to wait
>> for the lower priority jobs to leave the nodes.
>>
>> It would be nice if gridengine just notiched that the running jobs
>> ideed
>> are rerunnable, so it just pulls them off and launches the higher
>> priority stuff.
>
> you will have to setup a checkpointing environment, which checkpoints
> the job when the queue gets suspended and rerun it.
>
> a) the high-priority jobs will need a dedicated queue (and your
> configuration must allow the jobs to start, altough resources are
> already occupied by the low-priority jobs)

The system doesn't schedule stuff onto the "highpriority" slots (all.q)
when running jobs in the subordinate slots of the same queue.

http://krogh.cc/~jesper/all.q.txt
http://krogh.cc/~jesper/rerunnable.q.txt

Can you see the misconfiguration?

> b) the queue for low-priority jobs must be subordinated to the queue
> for high-priority jobs

Done.

> c) a checkpointing environment (for the low-priority jobs) which will
> abort the job on suspend, and attached to the queue for low-priority
> jobs

Works.. (qmod -s <jobid> flushes the job back to the qw position).

> d) low-priority jobs must request this checkpointing environment,
> maybe with a JSV for easy handling by the user
>
> There is a Howto for the checkpointing operation:
>
> http://gridengine.sunsource.net/howto/checkpointing.html

This tutorial was excellent.
-- 
Jesper

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214120

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list