[GE users] The "Olesen Method" flexlm works great!!

Olesen, Mark Mark.Olesen at arvinmeritor.com
Mon Jan 2 08:34:26 GMT 2006


Just a few points that could be clarified:

> > The standard load sensor approach can run into
> > accuracy difficulties
> > when non-cluster entities check out licenses.

The problem with the load sensor method exists regardless of
internal/external license usage.
Imagine the following situation:
  * a single software license 'managed' by a load sensor
  * a load sensor interval of say 60 seconds
  * a 1 second dispatch interval
  * 100 jobs waiting in the queue for the single software license

Assuming that the load sensor reports one free license, we have informed the
GridEngine that the resource 'license=1' ... at least until the next load
report in 60 seconds. Since a load sensor is responsible to
incrementing/decrementing the license counts, the scheduler is happy to fire
off as many jobs as possible.

At first this may be conceptually difficult to understand, but if you
replace 'license=1' with a load report 'interface=1000' (eg, Gigabit
interface), it becomes clear that the GridEngine cannot have a resource be
consumable *and* be a load value at the same time.

A load sensor approach for license management will always be a disaster!

>  Thanks for the info. But as far as my understanding
> goes even in the Olesen method we have a bit of a
> delay till the qlicserver runs a check on the license
> servers? or does qlicserver get notified by FlexLM
> when someone checks out a license ?

Yes, this is indeed an inherent problem with any external license usage.
Simply speaking, we can add some defensive programming, but there is still
(unfortunately) an element of luck.

The first line of defence is to add a check for available licenses and exit
99 if they are no longer available, for example using 'qlicserver -l ...' .

Nonetheless, particular race conditions are really beyond our control.
Imagine the following (not so hypothetical) situation:
  * All licenses are available.
  * A GridEngine job starts.
  * Approximately 30 seconds later, an external job starts.
According to all logic, everything is okay. However, in this example, the
GridEngine job first does some domain decomposition, saves some old data and
then starts with the true calculation stage (licenses required). In
contrast, the external job, which started afterwards, is a restart without
domain decomposition and thus acquires the licenses first, even although it
started afterwards.

Solving this problem is well and truly beyond the scope of *any* scheduler!


Dr. Mark Olesen
Principal Engineer Thermofluids Analysis
ArvinMeritor Light Vehicle Systems
ArvinMeritor Emissions Technologies GmbH
Biberbachstr. 9
D-86154 Augsburg, GERMANY

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list