[GE users] Floating node-locked license management and qlicserver
reuti at staff.uni-marburg.de
Thu Jul 30 11:49:56 BST 2009
Am 30.07.2009 um 10:43 schrieb daireb:
> This is a problem I have come across before (you can probably
> search the mailing list for my posts on it). I think the simplest
> solution in the end is to simply set aside some compute nodes for
> jobs with these licenses. So if you have 40 licenses you would set
> aside 40 hosts. Obviously this is very wasteful if you have a lot
> of the licenses but the licenses may also cost much more than a
> compute node anyway. Perhaps you could also tell jobs which request
> the licenses to just "prefer" these machines and all other jobs to
> "prefer to avoid" these machines but that sounds like what you have
> with your load sensors anyway.
> There may be some new tricks in SGE's armoury (v6.2) since I last
> looked at this to improve the "single license consumed for entire
> host" licensing scheme.
It's already an RFE as an extension to the just upgraded "consumable"
attribute, but not yet implemented.
> ----- "blair" <blair.bethwaite at infotech.monash.edu.au> wrote:
>> Hi all,
>> Apologies for the long post but I'm trying to share some
>> experience as
>> well as ask a couple of questions...
>> Recently I spent a good portion of a week struggling with the
>> following resource management issue in SGE (currently using 6.1u4)...
>> We're making available a software package, let's call it TheSoftware,
>> which uses the FlexM license manager to dish out licenses on a rather
>> odd basis - they are floating, but once assigned, node-locked (at
>> least this seems to be the terminology used in lmstat).
>> What this actually means is that one license for TheSoftware is
>> consumed per user per node. We're running a commodity high-throughput
>> cluster so naturally all the nodes have many cpus/cores so it is
>> possible for a user to have several instances of TheSoftware running
>> on a single node, which serendipitously means they can get extra bang
>> for a single license. E.g. say there are five 8core nodes (with 8
>> slots each) free, 20 licenses for TheSoftware available, and a user
>> submits 40 jobs using TheSoftware - all 40 jobs can run in parallel
>> and only consume 5 TheSoftware licenses. However, making the
>> manage this seems to be far from trivial - even when simplified for
>> the case of a single user running TheSoftware...
>> From the start I adopted a hybrid consumable + load sensor approach
>> that turns out to be very similar to that documented here
>> which I found later - the main difference being that mine is just
>> shell scripts with no caching and is specific to TheSoftware
>> It took me a while to figure out how to get SGE to maximise use of
>> licenses by preferring to schedule jobs requesting TheSoftware
>> licenses/consumable to a machine already running the TheSoftware. My
>> initial thought was to create a queue for each host with increasing
>> sequence numbers but that seemed like a very roundabout (and painful
>> to maintain) hack. After being frustrated by the documentation
>> regarding load sensors in the admin guide I finally stumbled onto the
>> sge_conf man page and realised this could all be configured on a per
>> host basis, and also that the load_sensor field was actually a
>> list of
>> paths. So I added a new sensor to the global configuration that
>> sets a boolean resource value on each host indicating whether
>> TheSoftware is running there, then jobs make a soft resource request
>> for this resource to be true, this approach works to a certain
>> A couple of issues remain though (any suggestions would be welcomed):
>> - This doesn't work with qlicserver because the internal consumable
>> accounting is done per job so if 8 instances of TheSoftware are
>> running on an 8core node SGE and qlicserver incorrectly think 8
>> licenses are being consumed (even though lmstat reports
>> otherwise). In
>> essence, qlicserver would require modification to handle the floating
>> dynamically node-locked licenses.
>> - Using a soft resource request makes it possible to open new
>> nodes to
>> TheSoftware jobs when e.g. a license is available but all existing
>> nodes running TheSoftware are full or there are none currently
>> running. However, particularly in the latter case, when SGE schedules
>> a batch of new jobs in a single interval there is no time to update
>> the load sensor that indicates TheSoftware is running. This means the
>> new jobs are distributed according to the usual scheduling heuristics
>> which tend to be worst case for conserving licenses.
>> If you read this far, thanks!
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users