[GE users] Qlicserver question: internal license count

Olesen, Mark Mark.Olesen at arvinmeritor.com
Mon Oct 16 10:17:26 BST 2006


    [ The following text is in the "X-UNKNOWN" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

> resources. The next time qlicserver is running, the situation changes
> like
> 
> load_values      NONE
> complex_values   license=0 (*)
> (internal_count) license=2
> (available)      license=-2  (complex - internal_count)
> 
> (*) Set by qlicserver -> diff -> $managed = $total - $extern

It looks like the licenses may not have been initialized for managing.
(ie, using the '-C' option).

If everything is running correctly, the complex_values should report the
number of licenses that the GridEngine is allowed to manage (total reported
license minus the licenses running external to the GridEngine).

Note: there was an oversight in the code that I've now fixed in my current
development version. If the license server query failed, the total licenses
would not be adjusted. If changed to mean that a failed query implies that
that license feature *cannot* be managed (ie, total=0). Perhaps this is
causing a problem.

> 
> In other words: the internal licenses are counted twice? Maybe I
> misunderstand
> qlicserver, esp. munge_licenses is pretty complex (and it says
> 
> "   # remove usage that is already accounted for
>     # remove non-existent / implausible entry"

Munge licenses is unfortunately just as ugly as the task itself :(
Munge licenses tries to match up the internal resource usage (reported by
qstat) with the license usage reported from lmstat.
1) If we have a direct 'user at host nslots' match, we can be quite safe that
the lmstat information corresponds to our job (ie, the licenses are being
used internally). 
2) If the previous did not match, we combine all the slots and count for a
particular 'user at host' and see which lmstat information could correspond to
this.  Such situations can occur when a job uses parallel licenses split
across multiple server, or if the parallel license can 'borrow' features.
All the lmstat reported licenses that do not correspond to one of the above
two matches are deemed to be external usage and thus decrease the amount of
resources that the GridEngine is permitted to manage.

> Or the internal_count gets reset whenever qconf is used to change the
> complex_values? That would make sense then.

No. Internal count comes from the GridEngine (via qstat) and is never
adjusted. Note that it is possible to confuse the entire mechanism if you
qalter the '-l' requests on a running job. I'm not sure in which GridEngine
update that has been disabled.


> Obviously there is then the problem of a job picking up the license
> >after<
> jobstart and the following qlicserver ran again?

This is not really a (direct) problem. If the job takes a long time before
it grabbing a license, the GridEngine mechanism and qlicserver are
unaffected. Strangeness can however start happening in the following
situations:

1)
User1 starts a SGE job1 on host1, but it takes a very long time to start.
User1 starts a non-SGE job2 on host1 and occupies a license.
The qlicserver cannot distinguish between job1 and job2 for user1 at host1, and
ignores the external license usage by job2.
SOLUTION: the user should think before running similar SGE/non-SGE jobs on
the same host.

2)
The SGE job1 takes a long time to start. The nasty colleague starts an
interactive job that blocks the license before job1 gets at the license.
SOLUTION: FlexLM reservation?

/mark

This e-mail message and any attachments may contain legally privileged, confidential or proprietary Information, or information otherwise protected by law of ArvinMeritor, Inc., its affiliates, or third parties. This notice serves as marking of its ?Confidential? status as defined in any confidentiality agreements concerning the sender and recipient. If you are not the intended recipient(s), or the employee or agent responsible for delivery of this message to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this e-mail message is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete this e-mail message from your computer.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list