[GE users] qlicserver behavior for suspended jobs

gutnik gutnik at gmail.com
Fri Nov 20 15:17:27 GMT 2009


> I don't see much difference in what you describe.
> With many CPUs and few licenses, you still wish to avoid jobs starting
> when there aren't enough licenses.
>
> If you use a plain load sensor with your situation (many cpus and few
> licenses), you will get what I termed a "crash condition" in the pdf
> presentation
> (http://olesenm.github.com/flex-grid/doc/SGE-WS2007-FlexLM-Integration-MarkOlesen.pdf), since calling it a race condition is really much too mild.
>
> Lets see what could happen in your case if you use a plain load sensor.
> To illustrate things, we'll deliberately make it quite extreme.
> Say you have lots of CPUs (1000) and several waiting jobs (500) but
> relatively few licenses available (currently 0, for what reason).
>
> The load sensor reports 0 licenses.
> The cluster is empty (1000 slots available), but the 500 jobs are
> waiting correctly for a license, since you specified '-l somelicense=1'.
> At some point, your rare license becomes available.
> At some reporting interval the load sensor will report 1 license is
> available.
> The resource conditions (slots and license) are now satisfied and the
> scheduler dispatches the 500 jobs.

Right, here is where my understanding of the scheduler falters. I
thought I could get around this
problem by using job_load_adjustments, and saying that every
low-priority license-taking job
has an "adjustment" value of 1 license. That way, when the 1 license
is available, the scheduler
could only schedule 1 job (because after it would be scheduled, there
would be no more available,
temporarily.)

So,
 1) I don't know if the scheduler really works that way. I think it
might, but I haven't tested
very carefully.
2) I don't see a way to set job_load_adjustment per queue, which is
what I'd need to have this
work at all well.


That said, this feels like a hack, even if it would work. I'd be happy
to find another solution,
but I don't see one for the low-priority jobs. Do you?

    Vadim

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=228237

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list