[GE users] resource allocation and race condition

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon Oct 18 07:47:42 BST 2004


Dear Mark,

well, in theory the scheduler works as you want to have it, with a minor
problem.

Assuming that you setup foo as described, assign it to the global host, and
have a load script reporting 1 for it. The scheduler dispatches the
first job and
decreases foo to 0. The second job will not be dispatched, since there is no
foo resource left.
That is, what you want, right?
The problem comes with the next scheduling run. The load values are only
updated
every 40 seconds or what ever is configured, but usually much longer
than a scheduling
run takes. Therefor the next scheduling run will work on the old load
report (foo = 1) and
dispatch the second job, which than fails, because it cannot get the
license.

Why do you need a load script for the licenses? Why do you not just use
consumables for them?

Stephan



Olesen, Mark wrote:

>With the change to v6, I am hoping to tackle a longstanding problem.
>
>My applications all run with floating licenses that I track via a load
>sensor and the following type of consumable resource:
>
>  #name shortcut type relop requestable consumable default urgency
>  #---------------------------------------------------------------
>  foo   foo      INT  <=    YES         YES        0       1000
>
>Assuming that I only have a single float license 'foo', I can
>'qsub -l foo=1' a job.  After a while I submit two (2) new jobs with the
>same resource requirement(s). Both these jobs wait politely in the queue,
>since the resource 'foo' is unavailable.  After the first job finishes, and
>the load reports get correctly updated, *both* of the jobs in the queue try
>to grab the 'foo' resource (almost) simultaneously.
>How can I circumvent such a race condition? 
>
>The last paragraph from
>gridengine.sunsource.net/project/gridengine/howto/resource_management.html
>states the following:
>"Grid Engine uses both information sources and does its best to derive
> from this how much of the resource is really available."
>
>Is there a special setup required to get the '-l' resource request
>registered immediately, or is this information simply misleading?
>
>Do I need to use prolog/epilog to manipulate something?
>Is there a way of (mis)using the advance reservation to solve this problem? 
>
>
>/mark
>
>Dr. Mark Olesen
>Thermofluid Dynamics Analyst
>ArvinMeritor Light Vehicle Systems
>ArvinMeritor Emissions Technologies GmbH
>Biberbachstr. 9
>D-86154 Augsburg, GERMANY
>tel: +49 (821) 4103 - 862
>fax: +49 (821) 4103 - 7862
>Mark.Olesen at ArvinMeritor.com
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list