[GE users] Floating node-locked license management and qlicserver

reuti reuti at staff.uni-marburg.de
Thu Jul 30 11:49:56 BST 2009


Am 30.07.2009 um 10:43 schrieb daireb:

> This is a problem I have come across before (you can probably  
> search the mailing list for my posts on it). I think the simplest  
> solution in the end is to simply set aside some compute nodes for  
> jobs with these licenses. So if you have 40 licenses you would set  
> aside 40 hosts. Obviously this is very wasteful if you have a lot  
> of the licenses but the licenses may also cost much more than a  
> compute node anyway. Perhaps you could also tell jobs which request  
> the licenses to just "prefer" these machines and all other jobs to  
> "prefer to avoid" these machines but that sounds like what you have  
> with your load sensors anyway.
>
> There may be some new tricks in SGE's armoury (v6.2) since I last  
> looked at this to improve the "single license consumed for entire  
> host" licensing scheme.

It's already an RFE as an extension to the just upgraded "consumable"  
attribute, but not yet implemented.

-- Reuti


> Daire
>
> ----- "blair" <blair.bethwaite at infotech.monash.edu.au> wrote:
>
>> Hi all,
>>
>> Apologies for the long post but I'm trying to share some  
>> experience as
>> well as ask a couple of questions...
>>
>> Recently I spent a good portion of a week struggling with the
>> following resource management issue in SGE (currently using 6.1u4)...
>> We're making available a software package, let's call it TheSoftware,
>> which uses the FlexM license manager to dish out licenses on a rather
>> odd basis - they are floating, but once assigned, node-locked (at
>> least this seems to be the terminology used in lmstat).
>>
>> What this actually means is that one license for TheSoftware is
>> consumed per user per node. We're running a commodity high-throughput
>> cluster so naturally all the nodes have many cpus/cores so it is
>> possible for a user to have several instances of TheSoftware running
>> on a single node, which serendipitously means they can get extra bang
>> for a single license. E.g. say there are five 8core nodes (with 8
>> slots each) free, 20 licenses for TheSoftware available, and a user
>> submits 40 jobs using TheSoftware - all 40 jobs can run in parallel
>> and only consume 5 TheSoftware licenses. However, making the  
>> scheduler
>> manage this seems to be far from trivial - even when simplified for
>> the case of a single user running TheSoftware...
>>
>> From the start I adopted a hybrid consumable + load sensor approach
>> that turns out to be very similar to that documented here
>> (http://wiki.gridengine.info/wiki/index.php/Olesen-FLEXlm- 
>> Integration),
>> which I found later - the main difference being that mine is just
>> shell scripts with no caching and is specific to TheSoftware  
>> licenses.
>> It took me a while to figure out how to get SGE to maximise use of
>> licenses by preferring to schedule jobs requesting TheSoftware
>> licenses/consumable to a machine already running the TheSoftware. My
>> initial thought was to create a queue for each host with increasing
>> sequence numbers but that seemed like a very roundabout (and painful
>> to maintain) hack. After being frustrated by the documentation
>> regarding load sensors in the admin guide I finally stumbled onto the
>> sge_conf man page and realised this could all be configured on a per
>> host basis, and also that the load_sensor field was actually a  
>> list of
>> paths. So I added a new sensor to the global configuration that  
>> simply
>> sets a boolean resource value on each host indicating whether
>> TheSoftware is running there, then jobs make a soft resource request
>> for this resource to be true, this approach works to a certain
>> extent.
>>
>> A couple of issues remain though (any suggestions would be welcomed):
>> - This doesn't work with qlicserver because the internal consumable
>> accounting is done per job so if 8 instances of TheSoftware are
>> running on an 8core node SGE and qlicserver incorrectly think 8
>> licenses are being consumed (even though lmstat reports  
>> otherwise). In
>> essence, qlicserver would require modification to handle the floating
>> dynamically node-locked licenses.
>> - Using a soft resource request makes it possible to open new  
>> nodes to
>> TheSoftware jobs when e.g. a license is available but all existing
>> nodes running TheSoftware are full or there are none currently
>> running. However, particularly in the latter case, when SGE schedules
>> a batch of new jobs in a single interval there is no time to update
>> the load sensor that indicates TheSoftware is running. This means the
>> new jobs are distributed according to the usual scheduling heuristics
>> which tend to be worst case for conserving licenses.
>>
>> If you read this far, thanks!
>> Regards,
>> ~Blair
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=210190
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=210217
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210235

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list