[GE users] auto rerun lower priority jobs when higher are waiting?

reuti reuti at staff.uni-marburg.de
Sun Aug 23 22:43:14 BST 2009


Am 23.08.2009 um 14:15 schrieb jesperkrogh:

> Can I instruct gridengine to automatically rerun lower-priority  
> jobs if
> higher are waiting in the queue. (if they are marked rerunnable)?
> The majority of our computations are doing checkpointing and are in  
> fact
> rerunnable, but sometime a user really just wants to get a bunch of  
> jobs
> on so they send it with a higher priority. But they still have to wait
> for the lower priority jobs to leave the nodes.
> It would be nice if gridengine just notiched that the running jobs  
> ideed
> are rerunnable, so it just pulls them off and launches the higher
> priority stuff.

you will have to setup a checkpointing environment, which checkpoints  
the job when the queue gets suspended and rerun it.

a) the high-priority jobs will need a dedicated queue (and your  
configuration must allow the jobs to start, altough resources are  
already occupied by the low-priority jobs)

b) the queue for low-priority jobs must be subordinated to the queue  
for high-priority jobs

c) a checkpointing environment (for the low-priority jobs) which will  
abort the job on suspend, and attached to the queue for low-priority  

d) low-priority jobs must request this checkpointing environment,  
maybe with a JSV for easy handling by the user

There is a Howto for the checkpointing operation:


and a nice state diagram in:


See also the man pages "sge_ckpt" and "checkpoint". Note: for the  
"when x" setting, no checkpointing will be done on migration (only in  
time interval "when m"), it's a bug in the documentation.

-- Reuti


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list