[GE users] SGE6.0u3 global consumable resource - applies to all queues

Walt Minkel wminkel at latticesemi.com
Thu Feb 3 00:44:10 GMT 2005


Hi Reuti,

My first attempt to attack this problem was following the how to for 
setting up
transfer queues and license load sensors.  The transfer queues worked great.
The load sensor however seemed to be too slow.  For example, with five 
licenses,
if three jobs were running and four more were queued in, often all four 
jobs would be
submitted before the load sensor (using FlexLM's lmstat) could report 
back the
load.  If I delayed the starting of jobs, ~10 seconds, the load sensor 
could keep
up.  Maybe there is a way to only queue in after an event like a load 
refresh??
I found a consumable resource could keep an exact count so decided to 
take that
path.   A load sensor would be ideal for the reason you state, that a 
user would
only need to request one resource.  It would also automatically correct the
number of licenses if users cheat and start a job outside the grid.

   Thanks again for your suggestions,
       Walt

Reuti wrote:

>Hello Walt,
>
>with transfer-queue you mean a setup according to the appropriate Howto? I 
>think this way you don't need a global resource at all. Just setup one queue 
>for the local execution with the value of the resource set to 1 and e.g. a 
>sequence number of 10 (the other local queues don't need it set to 0, because 
>it's not global available). The transfer-queue will get a higher sequence 
>number, a resource count of 5 (with enough slots in this queue), a 
>load_tresholds for this resource which should be 5 and a modified load sensor, 
>which will also track the remaining count of the resource of this license at 
>the remote site (this remote queue will look at both resource restrictions: the 
>one in the queue (which is setup with 5) and the one from the execution host 
>(i.e. remote cluster) which will be set by the load sensor).
>
>And this way you still will need only one resource for the complete setup and 
>not two. Also the users can just request one resource and they are done. To 
>force the remote execution for any reason, they can specify the queuename in 
>addition to the resource request.
>
>If the local queue is running out of licenses, the transfer-queue will be 
>checked, and if a license is available (hence the queue not put into alarm 
>state), the job will be transfered to the remote cluster.
>
>Cheers - Reuti
>
>
>Quoting Walt Minkel <wminkel at latticesemi.com>:
>
>  
>
>>Hi Reuti,
>>
>>You did have it right.  Complex "run_licA" are the machines or in my case a
>>machine with several transfer queues that defined where the job can run.
>>
>>The site and wan licenses are on two different flexlm servers.  Jobs are 
>>submitted
>>to SGE and are dispatched to a transfer queue when all conditions are 
>>correct
>>(primarily, when there is a license available).  The transfer queue is 
>>selected by
>>"run_licA" (a better name might be "run_at_siteX").   When a license is 
>>available,
>>the job is transferred to siteX.  In my situation, the license in 
>>questions is in high
>>demand from multiple sites.  The challenge is to somehow have SGE 
>>understand
>>that a local site license is available before consuming a WAN license.
>>
>>One solution I think will work is:  If I have 1 site license and 5 WAN 
>>licenses...
>>Make 6 license requestable. If a license is available, the job is put 
>>into a transfer
>>queue and a prolog (or the executable script), determines if the 
>>available license
>>is a site license and sets the appropriate environment variable.  I can 
>>also use the
>>number of cpu's to help control how many jobs are available to each 
>>transfer queue.
>>
>>    -Walt
>>
>>Reuti wrote:
>>
>>    
>>
>>>Walt,
>>>
>>>so I got one point wrong: I thought "run_licA" are the machines, on which it
>>>      
>>>
>>>could run. So you are looking for some kind of resource-staging. Although
>>>      
>>>
>>this 
>>    
>>
>>>is not directly implemented, in some way it could be simulated by setting up
>>>      
>>>
>>>two cluster-queues and give an order by setting a sequence number. But
>>>      
>>>
>>first: 
>>    
>>
>>>how will you track the usage of the world wide license? Do you have a
>>>      
>>>
>>central 
>>    
>>
>>>machine for this somewhere around?
>>>
>>>Quoting Walt Minkel <wminkel at latticesemi.com>:
>>>
>>> 
>>>
>>>      
>>>
>>>>Hi Reuti,
>>>>
>>>>Thank you for deepening my understanding of the global consumables.
>>>>
>>>>Your description of what I am trying to do is correct.  To fine tune my 
>>>>need,
>>>>I should add:   In addition to licenses being consumed for different 
>>>>tools, in one
>>>>tool's case, we have world wide WAN licenses as well as site licenses. 
>>>>My goal was
>>>>to use queue sequencing to look first at the site license.  For example, 
>>>>A user could queue
>>>>in to "run_licA" without needing to specify a the consumable license 
>>>>"licA_us".
>>>>
>>>>Suggestions are welcome but based on your comments, I think I will look
>>>>        
>>>>
>>for
>>    
>>
>>>>any available license (site+WAN)  and have my execution script sort out 
>>>>which
>>>>to use.
>>>>
>>>>    -Walt
>>>>
>>>>Reuti wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Hi there,
>>>>>
>>>>>what you observe is the correct behavior of SGE. The complex "licA_us"
>>>>>          
>>>>>
>>isn't
>>    
>>
>>>>>     
>>>>>
>>>>>requestable, and so it is attached to all jobs you submit to SGE. You
>>>>>          
>>>>>
>>have
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>two 
>>>>   
>>>>
>>>>        
>>>>
>>>>>in total, so you can only run two jobs in the whole cluster of all type
>>>>>          
>>>>>
>>of
>>    
>>
>>>>>     
>>>>>
>>>>>jobs. Where is the connection between "licA_us" and "run_licA" for now?
>>>>>
>>>>>If I understand you in the right way, you want one the one hand a limit
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>global 
>>>>   
>>>>
>>>>        
>>>>
>>>>>which is two, and specify at the same time the nodes, on which this type
>>>>>          
>>>>>
>>of
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>job 
>>>>   
>>>>
>>>>        
>>>>
>>>>>may run at all.
>>>>>
>>>>>You can achieve this when you make the complex requestable, then the
>>>>>          
>>>>>
>>user
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>can 
>>>>   
>>>>
>>>>        
>>>>
>>>>>request it for this type of job. But with the current setup of a second
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>complex 
>>>>   
>>>>
>>>>        
>>>>
>>>>>"run_licA", you would have to request both resourcesto get the desired 
>>>>>behavior.
>>>>>
>>>>>1. way)
>>>>>
>>>>>It's easier, when you disregard "run_licA" completely, and attach the
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>"licA_us" 
>>>>   
>>>>
>>>>        
>>>>
>>>>>also to the nodes on which the jobs may run, set to the number of CPUs
>>>>>          
>>>>>
>>in
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>this 
>>>>   
>>>>
>>>>        
>>>>
>>>>>machines. The request of the job has to fulfill both restrictions.
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>Here I have to correct myself: for nodes, not eligible for this type of job,
>>>      
>>>
>>it 
>>    
>>
>>>must also by defined and set to 0.
>>>
>>>Cheers - Reuti
>>>
>>> 
>>>
>>>      
>>>
>>>>>2. way)
>>>>>
>>>>>Make one cluster queue for this type of job and setup a hostgroup with
>>>>>          
>>>>>
>>the
>>    
>>
>>>>>     
>>>>>
>>>>>eligible machines. For this queue set "licA_us" to the number of CPUs,
>>>>>          
>>>>>
>>and
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>in 
>>>>   
>>>>
>>>>        
>>>>
>>>>>all other queues set it to 0. Also here, no "run_licA" is needed.
>>>>>
>>>>>In both cases, the user has to specify only the request of the resource 
>>>>>"licA_us".
>>>>>
>>>>>Cheers - Reuti
>>>>>
>>>>>Quoting Walt Minkel <wminkel at latticesemi.com>:
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Hi All,
>>>>>>
>>>>>>I am trying to use global consumable resources to manage a variety of 
>>>>>>licenses using
>>>>>>SGE6.0u3.  I'm not seeing the behavior I expect...  As soon as I modify
>>>>>>            
>>>>>>
>>>>>>the global execution
>>>>>>host consumable form in qmon, every queue in my  grid is forced to find
>>>>>>            
>>>>>>
>>>>>>that resource
>>>>>>available before a job is run.  In some cases  I only have  two 
>>>>>>licenses.  Defining a global
>>>>>>resource for these two licenses, limits the total  number of jobs 
>>>>>>capable of running at one
>>>>>>time to two for the entire grid.
>>>>>>
>>>>>>My complex looks like this:
>>>>>>#name               shortcut          type        relop requestable 
>>>>>>consumable default  urgenc
>>>>>>lic_bv_us           licA_us         INT         <=         NO          
>>>>>>            
>>>>>>
>>>>>>YES           1         0
>>>>>>
>>>>>>I am using the following to execute my script:
>>>>>>
>>>>>>qsub  -l  run_licA=1 myScript.csh                 run_licA is a complex
>>>>>>            
>>>>>>
>>>>>>I use to direct where the job can
>>>>>>                                                                     
>>>>>> run.
>>>>>>
>>>>>>Queuing more than  two jobs (total for all queues), the pending jobs 
>>>>>>show this message:
>>>>>>
>>>>>>      (-l run_licA)  cannot run globally because for default request 
>>>>>>it offers only gc:licA_us=0.0000
>>>>>>
>>>>>>Maybe I'm missing something, or my approach is wrong.  Any suggestions 
>>>>>>would be greatly
>>>>>>appreciated.
>>>>>>
>>>>>>Thanks,
>>>>>>Walt
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>> 
>>>
>>>      
>>>
>>    
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>  
>




More information about the gridengine-users mailing list