[GE users] auto rerun lower priority jobs when higher are waiting?
reuti at staff.uni-marburg.de
Sun Aug 23 22:43:14 BST 2009
Am 23.08.2009 um 14:15 schrieb jesperkrogh:
> Can I instruct gridengine to automatically rerun lower-priority
> jobs if
> higher are waiting in the queue. (if they are marked rerunnable)?
> The majority of our computations are doing checkpointing and are in
> rerunnable, but sometime a user really just wants to get a bunch of
> on so they send it with a higher priority. But they still have to wait
> for the lower priority jobs to leave the nodes.
> It would be nice if gridengine just notiched that the running jobs
> are rerunnable, so it just pulls them off and launches the higher
> priority stuff.
you will have to setup a checkpointing environment, which checkpoints
the job when the queue gets suspended and rerun it.
a) the high-priority jobs will need a dedicated queue (and your
configuration must allow the jobs to start, altough resources are
already occupied by the low-priority jobs)
b) the queue for low-priority jobs must be subordinated to the queue
for high-priority jobs
c) a checkpointing environment (for the low-priority jobs) which will
abort the job on suspend, and attached to the queue for low-priority
d) low-priority jobs must request this checkpointing environment,
maybe with a JSV for easy handling by the user
There is a Howto for the checkpointing operation:
and a nice state diagram in:
See also the man pages "sge_ckpt" and "checkpoint". Note: for the
"when x" setting, no checkpointing will be done on migration (only in
time interval "when m"), it's a bug in the documentation.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users