[GE users] consumable license complexes and preemption

sgenedharvey sge at nedharvey.com
Tue Nov 16 01:20:00 GMT 2010

> From: jtseng_sf [mailto:jtseng at sandforce.com]
> Hi Everyone,  I'm thinking of patching 6.2u5 to allow certain consumables
> be NOT counted by SGE when the job is preempted.
> Comments? Suggestions?

For precisely the same reason, I would very much like this feature.  For the
time being, we have chosen to implement job killing and resubmission, rather
than suspension.  Specifically because there is apparently no way to release
resources when suspended.

Althesame, I was continually surprised, from one step to the next, how
complex the whole problem was.  There's a lot of complexity about (a)
choosing what job to suspend, and (b) choosing which and when to unsuspend

When it comes to choosing what job to suspend, the way I implemented it was:
A loadsensor continually scans the queue.  Whenever there is a higher
priority job waiting (even that test was difficult to define, due to the
variable nature of job priorities)  ... Whenever there is a higher priority
job waiting, the loadsensor scans the list of all running jobs, to see if
there is any lower priority job, whose resource consumption would enable the
higher priority job to run, if only the lower priority running job were

In order to implement that, I needed an object oriented way to compare
waiting jobs against the available resources, and running job resources.  I
had to find *all* of the running jobs which could possibly satisfy the needs
of my waiting job... and then make a decision about which one to suspend.
So I'm literally reimplementing some job prioritization/sorting code...  I
started by looking at the flex-grid (qlicserver) source code.  He did a fine
job, but his fault was in using perl instead of python.  ;-)  Seriously
though, in this case, python was a much better choice, because here's how it
works...  You run "qstat -F -r -xml" and then you use the xml.dom.minidom
module to parse it into object oriented goodness.  The advantage python had
over perl in this case, was the universally available, solid and standard
xml parsing library.  Python supports actual object orientedness, whereas
perl really doesn't.  Perl is a hack at best, in this regard.

After selecting a job for resubmission ... I had to scan all the waiting
jobs and build a job dependency tree (well, just find dependents waiting on
the job that I'm about to kill) ... and qmod all of the jobs that are
waiting for this one.  Because qresub just creates a copy of a job, I had to
qresub the running job, then modify all the dependents to depend on the new
jobid, and then qdel the running job.  This would not be an issue for you if
you're suspending instead of resubbing.

While testing the idea of suspending instead of resubbing...  (This part is
very relevant to you) ... Please double-check the facts here.  Cuz I'm
surely going to mixup "SIGSTOP" and "SIGTSTP" ...  When you tell SGE to
suspend a running job, by default, it will issue the SIGSTOP to the running
job.  This effectively suspends the job, but it does it at a kernel level.
The job has no idea that it's been suspended, and the job doesn't free up a
license in the license server.  (At least in synopsys tools.)  I had to
modify the suspend command (there's a property ... I think it's a queue
property)  I had to modify the suspend command to kill -TSIGSTP instead of
SIGSTOP.  This way, the running process gets the signal, releases the
license from flexlm, and voluntarily goes to sleep.

So the problem is:  You need to free up the license from flexlm, *and* you
need to free up the complex in SGE.  I was able to accomplish the former,
but not the latter.  I thought about perhaps qlicserver could "notice" the
suspended jobs, and have a new field, besides "intern" and "extern" ... a
new field "suspended" which would artificially inflate the number of
consumables by the number of suspended ones...  But even if that were a
simple task ...  There's complexity in resuming too.  

Ideally what you want is ...  The suspended job could be waiting for more
than one resource.  A slot on the machine where it is running (sleeping).
One or more licenses from flexlm.  Perhaps any number of resources (but
probably just the flexlm and slots.)  So ideally, you would want the
suspended job to "virtually" grab resources back...  Hold the resources and
prevent any equal or lower priority jobs from taking those resources, but
release those resources if it would cause any higher priority job to wait...
If you're not careful here, you'll unsuspend the job instead of running a
higher priority job, or you'll never unsuspend the job because lower
priority jobs get the resources instead.  Only when all the resources become
available again, then the suspended job is eligible for resume.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list