[GE users] Setting a maintenance window

kisielk kamil at zymeworks.com
Mon Feb 8 23:12:09 GMT 2010

> Hi,
> Am 08.02.2010 um 20:29 schrieb kisielk:
> > I'm currently working on a project to migrate our cluster from  
> > Torque + Maui to Sun Grid Engine.
> >
> > One thing I can't quite figure out how to set up in SGE is to have  
> > a maintenance window on the cluster. In our current Maui system I  
> > can set up an advance reservation on the entire cluster to occur  
> > some time in the future. Currently running jobs will not be  
> > impacted, even if their runtime would overlap in to that time  
> > period. New jobs are scheduled around the reservation, including  
> > backfill of lower priority jobs that can still fit in the remaining  
> > time window if higher priority jobs cannot.
> >
> > What facility/facilities would I use in SGE to accomplish the same  
> > thing?
> you can also submit an advance reservation (AR) in SGE (qrsub), it  
> would just need to request a PE and all available slots in the system.
> New jobs with a runtime greater than the remaining timeframe won't  
> start, and backfilling would also work. But when there are already  
> running jobs with a longer h_rt request that wouldn't fit into the  
> remaining time, then you AR won't be granted.
> Does an AR in Maui not  
> honor the already running jobs, i.e. you can request all slots in the  
> cluster although they are used by running jobs already?

They don't honour the already running jobs. Of course, you will not be able to submit a job that utilizes the resources already in use during the reservation period. However since this is for maintenance, this is not a problem in practice. I suppose the reason this isn't actually enforced is because in most cases jobs never reach their requested walltime anyway.

> What should happen during the maintenance timeframe with the already  
> running jobs?

At our site we usually take care of these manually on a case-by-case basis. Usually this means just deleting them, but sometimes special precautions must be taken depending on the user and application.

> Another option besides an AR is to define a calendar for all queues  
> which should be switched off at this certain time. This looks more  
> like the behavior you expect: like with the AR shorter running jobs  
> would start, backfilling will work, and already running job would  
> continue until the queues are drained (resource reservation in the  
> scheduler configuration must be switched on for this to work: entry  
> max_reservation > 0).

This sounds like it might be a better path to get my intended behavior. I will have to read up more on how calendars work..


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list