[GE users] consumable license complexes and preemption

jtseng_sf jtseng at sandforce.com
Tue Nov 16 15:55:52 GMT 2010

More refinement:

A variation of the workaround would be to submit a dummy job that does 
not directly go into a master/subordinate queue.
Rather the dummy job would just simply do a qmod suspension of the job.
When the master job finishes, it would qmod resume the job.

This would simplify the queue configuration and remove the one-to-one 
master/slave job mapping that would need to happen inside of SGE.
(I'm not sure slotwise preemption would work here - it's not clear to me 
how to speficy which "slot" is choosen for preemption)


A second workaround is to have an external load sensor do the suspen
On 11/16/2010 7:43 AM, John Tseng wrote:
> Hi Reuti,  your comments are very valid.
> As both sgenedharvey and yourself have pointed out, the preemption of
> licenses (as oppose to machines) is not a "simple" operation.
> The "dont count certain consumables when a job is suspended" patch is
> only a building block to get to a "look-ahead" like feature.
> In the past, I've artificially increased the number of licenses to allow
> a "master" job to subordinate/premept a "slave" job.
> This requires a lot of complexity to make sure that the "one" master job
> preempts the one slave job and not to allow a rogue job through.
> This can be done using per host per cpu slot "host queues" and using
> queue thresholds to open/close specific host queues.
> However, the complexity is enormous.
> I'd like to simplify.
> One workaround is to have a "dummy" job actually do the "slave"
> subordination instead of the actual "master" job.
> An external load sensor would determine when to allow
> subordination/preemption and submit the dummy job.
> Since the dummy job would cause the license to be freed, then the master
> job can take the license.
> The dummy job would quit after the master job has finished.  The slave
> job would then resume.
> In this scenario, it is not necessary that the "master" job run on the
> same machine as the "slave" job.
> The only issue left is that SGE sees  the "slave" job still consuming a
> license.
> sgenedharvey points out the amount of complexity that can occur in
> determining which "slave" job to subordinate, but that is left as an
> exercise to the reader :)
> If the "don't count certain consumables when a job is suspended" patch
> is implemented, then the workaround is much more straightforward.
> Future patches can build upon this.
> Perhaps the scheduler can account for subordination like it does for
> reservation - but I haven't reviewed the code nor do I understand the
> code yet.
> -john
> On 11/16/2010 3:47 AM, reuti wrote:
>> Hi,
>> Am 16.11.2010 um 00:15 schrieb jtseng_sf:
>>> Hi Everyone,  I'm thinking of patching 6.2u5 to allow certain
>>> consumables to be NOT counted by SGE when the job is preempted.
>> the problem is, that you will free (in your patch: ignore) the used consumables of the subordinated job *after* the new job was dispatched to a node, and as a result of this dispatch the to be preempted job gets suspended and the resource consumption will be ignored finally.
>> The real solution would be some kind of look-ahead feature to suspend a job (although there is still no job running in the superordinated queue) to get resources back with your patch, and after collecting all the necessary resources for the superordinated job to dispatch it to the node(s).
>> -- Reuti


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list