[GE users] consumable license complexes and preemption
reuti at staff.uni-marburg.de
Wed Nov 17 14:18:11 GMT 2010
Am 17.11.2010 um 15:12 schrieb sgenedharvey:
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> maybe a forced advance reservation would do. The
>> resource reservation suspends the running job(s), but doesn't consume
> An advance reservation ... not a bad idea ...
> I talked about complexities of resuming a suspended job. Making sure it
> will only resume at the right time ... not before its priority, and not
> waiting for things of lower priority etc.
> Perhaps if you cloned the resource requirements of the suspended job, and
> submitted a job just like it, which is only a "resume" job. So the job in
> queue will obtain the necessary resources at the right time, and the
> suspended job resumes, and the "resume" job disappears. Perfect.
> The only problem I can think of is ...
> Suppose you've got a medium-priority job running on SystemX, which consumes
> ResourceA and ResourceB. You suspend it in order to make way for a high
> priority job. You create a medium-priority "resume" job which requires a
> slot on SystemX, one ResourceA, and one ResourceB. You queue up a million
> low-priority jobs that require ResourceA, and another million low-priority
> jobs that require ResourceB. Now the problem is ... the "resume" cannot
> happen until there is a coincidence, SystemX, ResourceA, and ResourceB must
> all be available at some particular dispatch interval. If these resources
> are freed up one at a time ... then the low priority jobs will continually
> keep grabbing whichever one is available, and preventing the coincidence of
> all three... thus preventing the medium priority resume from taking place.
> But let's not merge two separate problems. Even today, if you have a medium
> priority job requiring 2 different resources, and a million low-priority
> jobs which only consume one ... the low priority jobs will also prevent the
> med pri jobs from running. So that issue is really separate and independent
> from the idea of resuming jobs. It's already present; it always has been; I
> don't hear anybody complaining about it.
do you also observe this when you submit the job requiring 2 different resources with "-R y"? The resource reservation is not limited to slots for parallel jobs, but can collect anything you need for your job. Only current pitfall: "default_duration unlimited" and no time limit for the jobs which require only one resource. SGE judges "unlimited" being smaller than "unlimited" and starts backfilling. I set "default_duration to 9999:00:00" because of this to prevent it.
> But I still agree, the idea of "reserving" resources is a good idea.
> Although nobody's complaining about the above resource contention problem
> yet ... suspending & resuming jobs could be the ingredient which tips the
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users