[GE users] consumable license complexes and preemption
jtseng at sandforce.com
Tue Nov 16 15:43:11 GMT 2010
Hi Reuti, your comments are very valid.
As both sgenedharvey and yourself have pointed out, the preemption of
licenses (as oppose to machines) is not a "simple" operation.
The "dont count certain consumables when a job is suspended" patch is
only a building block to get to a "look-ahead" like feature.
In the past, I've artificially increased the number of licenses to allow
a "master" job to subordinate/premept a "slave" job.
This requires a lot of complexity to make sure that the "one" master job
preempts the one slave job and not to allow a rogue job through.
This can be done using per host per cpu slot "host queues" and using
queue thresholds to open/close specific host queues.
However, the complexity is enormous.
I'd like to simplify.
One workaround is to have a "dummy" job actually do the "slave"
subordination instead of the actual "master" job.
An external load sensor would determine when to allow
subordination/preemption and submit the dummy job.
Since the dummy job would cause the license to be freed, then the master
job can take the license.
The dummy job would quit after the master job has finished. The slave
job would then resume.
In this scenario, it is not necessary that the "master" job run on the
same machine as the "slave" job.
The only issue left is that SGE sees the "slave" job still consuming a
sgenedharvey points out the amount of complexity that can occur in
determining which "slave" job to subordinate, but that is left as an
exercise to the reader :)
If the "don't count certain consumables when a job is suspended" patch
is implemented, then the workaround is much more straightforward.
Future patches can build upon this.
Perhaps the scheduler can account for subordination like it does for
reservation - but I haven't reviewed the code nor do I understand the
On 11/16/2010 3:47 AM, reuti wrote:
> Am 16.11.2010 um 00:15 schrieb jtseng_sf:
>> Hi Everyone, I'm thinking of patching 6.2u5 to allow certain
>> consumables to be NOT counted by SGE when the job is preempted.
> the problem is, that you will free (in your patch: ignore) the used consumables of the subordinated job *after* the new job was dispatched to a node, and as a result of this dispatch the to be preempted job gets suspended and the resource consumption will be ignored finally.
> The real solution would be some kind of look-ahead feature to suspend a job (although there is still no job running in the superordinated queue) to get resources back with your patch, and after collecting all the necessary resources for the superordinated job to dispatch it to the node(s).
> -- Reuti
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users