[GE users] Setting a maintenance window
kamil at zymeworks.com
Mon Feb 8 23:12:09 GMT 2010
> Am 08.02.2010 um 20:29 schrieb kisielk:
> > I'm currently working on a project to migrate our cluster from
> > Torque + Maui to Sun Grid Engine.
> > One thing I can't quite figure out how to set up in SGE is to have
> > a maintenance window on the cluster. In our current Maui system I
> > can set up an advance reservation on the entire cluster to occur
> > some time in the future. Currently running jobs will not be
> > impacted, even if their runtime would overlap in to that time
> > period. New jobs are scheduled around the reservation, including
> > backfill of lower priority jobs that can still fit in the remaining
> > time window if higher priority jobs cannot.
> > What facility/facilities would I use in SGE to accomplish the same
> > thing?
> you can also submit an advance reservation (AR) in SGE (qrsub), it
> would just need to request a PE and all available slots in the system.
> New jobs with a runtime greater than the remaining timeframe won't
> start, and backfilling would also work. But when there are already
> running jobs with a longer h_rt request that wouldn't fit into the
> remaining time, then you AR won't be granted.
> Does an AR in Maui not
> honor the already running jobs, i.e. you can request all slots in the
> cluster although they are used by running jobs already?
They don't honour the already running jobs. Of course, you will not be able to submit a job that utilizes the resources already in use during the reservation period. However since this is for maintenance, this is not a problem in practice. I suppose the reason this isn't actually enforced is because in most cases jobs never reach their requested walltime anyway.
> What should happen during the maintenance timeframe with the already
> running jobs?
At our site we usually take care of these manually on a case-by-case basis. Usually this means just deleting them, but sometimes special precautions must be taken depending on the user and application.
> Another option besides an AR is to define a calendar for all queues
> which should be switched off at this certain time. This looks more
> like the behavior you expect: like with the AR shorter running jobs
> would start, backfilling will work, and already running job would
> continue until the queues are drained (resource reservation in the
> scheduler configuration must be switched on for this to work: entry
> max_reservation > 0).
This sounds like it might be a better path to get my intended behavior. I will have to read up more on how calendars work..
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users