[GE users] exclusive host use and subordinate queue
ajw at illinois.edu
Fri Dec 18 15:46:12 GMT 2009
> Am 17.12.2009 um 16:27 schrieb ajw:
> > I have 2 queues set in grid engine. A normal priority queue and a
> > low priority queue which is subordinate to the normal priority queue.
> > In normal (non-exclusive) use I have it configured so when a job is
> > submitted to the normal priority queue any jobs running in the low
> > priority queue will get killed and resubmitted.
> > The problem comes when a normal priority job is submitted with
> > excl=true and a low priority job is already running. The normal
> > priority job won't start in this situation. I get this message:
> > (-l exclusive=true) cannot run at host "xxx" because exclusive
> > resource (exclusive) is already in use
> > Is there any way to change this behavior?
> no. The problem is similar to using a license of an already running job:
> Once a job is scheduled, SGE will never consider it for something
> like rescheduling. I assume, you use a custom terminate_method to
> reschedule the job? SGE can't know this, and hence the resources are
> still in use.
Yes, that does seem to be the same problem.
It is a custom suspend_method not terminate, but that doesn't really matter.
> What you can try: having a special queue for exclusive jobs (one
> slot) and don't request any resources in the qsub command.
> Subordinated to this exclusive.q: normal.q and low.q. In the
> normal.q, you have to subordinate exclusive.q (so, either 1, 2,
> 3, ... are running in the normal.q or only one in the exclusive.q).
Before exclusive scheduling, I just made a PE that was set to fillup, so users just needed to request the number of slots available on the machine and they would get exclusive access to the machine and kick off the low priority jobs. I can tell them to revert to that method for single machine jobs. It's not quite as nice as just requesting exclusive access, though.
But exclusive scheduling for actual parallel jobs won't work the same, and I think that is a good feature to ensure a parallel job gets on the minimum number of nodes. I don't know of another way to configure that.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users