[GE users] consumable license complexes and preemption

sgenedharvey sge at nedharvey.com
Wed Nov 17 16:22:31 GMT 2010

> From: reuti [mailto:reuti at staff.uni-marburg.de]
> http://wikis.sun.com/display/gridengine62u5/Scheduling+Strategies
> section "Resource Reservation and Backfilling".

Excellent.  Thank you very much.

So the conclusion here is:  
#1 There is some complexity in order to select a job for suspension.  I have
overcome this complexity, at least sufficient enough for my work
environment.  Most likely my company won't allow me to release the code
open-source, but I can ask.  It was several days of coding effort to get it
all functional and stable.  And maybe it would be useful to others.  I can
certainly talk about it and offer advice to anybody else who is doing
something similar, even if I can't release my code.
#2 There is a "gotcha" ... In order to use "qmod -sj" for this purpose, you
need to configure the proper kill signal to suspend your job in such a way
that the license is actually released in lmgrd.  The standard kill signal
for suspending jobs does not release the license from lmgrd.  But this is
easy to remedy, at least with synopsys tools that my users are using.
Hopefully the behavior is universally consistent amongst all the various
software applications people might be running out there in SGE, but it is in
fact dependent on the software vendor.  ("Easy" is a relative term.  It took
me several hours to figure it out, and I'm pretty good at this stuff.)
#3 It certainly seems possible to add a new feature to qlicserver
(flex-grid) which would "notice" suspended jobs, and artificially inflate
the number of consumables accordingly.
#4 You should be able to submit a "qmod -usj" unsuspend command via "qsub -R
y".  So the unsuspend command will wait in queue, reserve the necessary
resources according to the right priority, and wait till they're all
available, before unsuspending the suspended job, at the right time with all
the right resources.

I think this is an actual realistic recipe.  I have personally done #1 and
#2.  I looked into #3 and it seemed ... slightly more than trivial, but
certainly possible.  And I've never done #4 but I don't see any reason why
it wouldn't work.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list