[GE users] exclusive host use and subordinate queue
reuti at staff.uni-marburg.de
Sat Dec 19 16:49:54 GMT 2009
Am 18.12.2009 um 16:46 schrieb ajw:
>> Am 17.12.2009 um 16:27 schrieb ajw:
>>> I have 2 queues set in grid engine. A normal priority queue and a
>>> low priority queue which is subordinate to the normal priority
>>> In normal (non-exclusive) use I have it configured so when a job is
>>> submitted to the normal priority queue any jobs running in the low
>>> priority queue will get killed and resubmitted.
>>> The problem comes when a normal priority job is submitted with
>>> excl=true and a low priority job is already running. The normal
>>> priority job won't start in this situation. I get this message:
>>> (-l exclusive=true) cannot run at host "xxx" because exclusive
>>> resource (exclusive) is already in use
>>> Is there any way to change this behavior?
>> no. The problem is similar to using a license of an already
>> running job:
>> Once a job is scheduled, SGE will never consider it for something
>> like rescheduling. I assume, you use a custom terminate_method to
>> reschedule the job? SGE can't know this, and hence the resources are
>> still in use.
> Yes, that does seem to be the same problem.
> It is a custom suspend_method not terminate, but that doesn't
> really matter.
Besides a consumable complex having the (new) attribute "JOB" (job
needs it once per job) it was in the original discussion to allow an
entry "HOST" (job needs it once per host, irregardless of the number
of granted slots there). I don't know, whether this is still on the
board. But if it would be available, the normal priority jobs could
just request it with amount "1" and it would be one per host
available. The low priority wouldn't request it and so the newly
started job could push them out of the machine with your current setup.
>> What you can try: having a special queue for exclusive jobs (one
>> slot) and don't request any resources in the qsub command.
>> Subordinated to this exclusive.q: normal.q and low.q. In the
>> normal.q, you have to subordinate exclusive.q (so, either 1, 2,
>> 3, ... are running in the normal.q or only one in the exclusive.q).
> Before exclusive scheduling, I just made a PE that was set to
> fillup, so users just needed to request the number of slots
> available on the machine and they would get exclusive access to the
> machine and kick off the low priority jobs. I can tell them to
> revert to that method for single machine jobs. It's not quite as
> nice as just requesting exclusive access, though.
> But exclusive scheduling for actual parallel jobs won't work the
> same, and I think that is a good feature to ensure a parallel job
> gets on the minimum number of nodes. I don't know of another way
> to configure that.
Setting up preemption is always tricky in SGE, as it's not foreseen
in its implementation to take crare of already started jobs once they
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users