FW: [GE users] FW: Pick nodes from one queue plus 1 node from another queue

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon Apr 4 19:02:19 BST 2005


William Burke wrote:

>Hi Stephan,
>
>  
>
>>Craigs' approach limits the hosts to a the ones usablef for your job.
>>    
>>
>
>I am a little unclear to what you mean ' a the ones usablef'?
>  
>
Sorry, must have lost control over my keyboard. :-))

I ment, that you can specify target hostgroups as filters. The remaining
hosts will most likely contain more slots than your pe job needs. The
slot assignment will than be done on a load or sequence number level.

I hope this is a bit clearer.

Stephan

>William
>
>-----Original Message-----
>From: Stephan Grell - Sun Germany - SSG - Software Engineer
>[mailto:stephan.grell at sun.com] 
>Sent: Monday, April 04, 2005 9:29 AM
>To: William Burke
>Cc: users at gridengine.sunsource.net
>Subject: Re: FW: [GE users] FW: Pick nodes from one queue plus 1 node from
>another queue
>
>
>
>William Burke wrote:
>
>  
>
>>Hi Stephan,
>>
>> 
>>
>>    
>>
>>>From: Stephan Grell - Sun Germany - SSG - Software Engineer
>>>   
>>>
>>>      
>>>
>>[stephan.grell at sun.com] 
>> 
>>
>>    
>>
>>>You can assign a sequence number for each queue instance and change the 
>>>scheduler configuration to use the sequence number for selection queues 
>>>instead of the load value.
>>>This way you can encode your table in the queue instances. However, this 
>>>is some work.
>>>
>>>Or you could ask the scheduler to do least_used_first / fill_up on the 
>>>different queues instances.
>>>   
>>>
>>>      
>>>
>>It's a little unclear how this may work can you elaborate?  Does your
>>solution work in conjunction with Craig Tierney solution? SEE BELOW
>>
>>    
>>
>Craigs' approach limits the hosts to a the ones usablef for your job. After
>the filtering step, you can tell the scheduler, how to sort the queue
>instances.
>
>The "least_used_first" and "fill_up" configuration is not straight
>forward. One
>need to configure slots on host level, "qconf -me <host>" 
>"complex_values        slots=<NR>".
>The next step is the use of slots as a load value.
>"qconf -msconf"
>    "queue_sort_method                 load"
>    "load_formula                      [+-]slots"
>
>  slots : is_least_used_first
>-slots : is fill_up
>
>Does this help?
>
>Stephan
>
>  
>
>>I asked:
>> 
>>
>>    
>>
>>>>Since I am not aware of SGE possessing this PBS functionality to 
>>>>explicitly pick an exact amount of nodes from one queue and another 
>>>>queue, does anyone have that function implemented in there SGE 
>>>>environment?
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>Craig suggested:
>>* SGE does not support this function.  It is something we used to do with 
>>* PBS and needed to find a solution.  How many ionodes do you have?  Our 
>>* solution wasn't great, but since our IO node count is less than 10 the 
>>* following works.
>>*
>>* 1) Create a host group with the nodes in the compute pool.
>>* 2) For each IO node, create a host group that includes that
>>* 1 ionode and the compute pool host group.  You can specify this when you 
>>* use "qconf -ahgrp" by listing the nodes as "ionodeN @compute".  That way,
>>    
>>
>
>  
>
>>* when the compute nodes change, making a change to @compute changes all of
>>    
>>
>
>  
>
>>* the groups.
>>* 3) Create a parallel environment and cluster queue for each IO node.
>>*
>>* Lets say that each parallel environment for each IO node is called
>>    
>>
>peioNN,
>  
>
>>* where NN is an integer.  Also, each IO node is named ioMM, where MM is an
>>* integer.  The numbering convention doesn't have to be consistent.
>>*
>>* For a user to submit a job, where there are 3 IO nodes, it would look
>>* like:
>>*
>>* qsub -pe 'peio*' 16 -masterq \*@io1,\*io2,\*io3 myjob.csh
>>*
>>* This will do what you want.  It isn't very clean though.  
>>* It will be difficult to let users know of changes and to ensure they do
>>    
>>
>it
>  
>
>>* right.
>>*
>>* You can write a wrapper script to modify the users script and qsub line
>>    
>>
>to
>  
>
>>* do this for them.  For us, the user just submits a job to the virtual pe 
>>* 'io'.  If we see that, we remap the options to qsub to look like the 
>>* syntax above.  If the command has to change, it exists 1 one place on the
>>    
>>
>
>  
>
>>* shared filesystem.
>>
>>
>>Regards, 
>>William
>>
>>
>>-----Original Message-----
>>From: Stephan Grell - Sun Germany - SSG - Software Engineer
>>[mailto:stephan.grell at sun.com] 
>>Sent: Monday, April 04, 2005 3:15 AM
>>To: users at gridengine.sunsource.net
>>Subject: Re: FW: [GE users] FW: Pick nodes from one queue plus 1 node from
>>another queue
>>
>>
>>
>>William Burke wrote:
>>
>> 
>>
>>    
>>
>>>Hi,
>>>
>>>
>>>
>>>I am back and would like to know if anyone has a clean way in SGE to
>>>control job submissions among queues?
>>>
>>>
>>>
>>>Quoting Reuti
>>>
>>>
>>>
>>><snip>
>>>
>>>   
>>>
>>>      
>>>
>>>>I think, he means an allocation in PBS like:
>>>>     
>>>>
>>>>-l nodes=ionode:1+compute:5
>>>>     
>>>>
>>>>        
>>>>
>>>1)
>>>
>>>Since I am not aware of SGE possessing this PBS functionality to
>>>explicitly pick an exact amount of nodes from one queue and another
>>>queue, does anyone have that function implemented in there SGE
>>>environment?
>>>
>>>
>>>
>>>2)
>>>
>>>When a user specifies a particular queue:
>>>
>>>
>>>
>>>I would like SGE to first pick all available nodes in that queue which
>>>_belong to the least number of additional queues_, how could I
>>>implement this functionality in SGE_?_ Satisfying this requirement
>>>would avoid using a node unnecessarily that has additional specialized
>>>attributes such as nodes that are heavily used for io operations?
>>>
>>>
>>>
>>>For example, if 32 cpus are requested from the QueueA.q, I would like
>>>SGE to
>>>
>>>
>>>
>>>SEE BELOW
>>>
>>>
>>>
>>>1.    First check nodes that belong to the least amount of queues
>>>(all.q and QueueA.q) which happen to be grid_nodes06-47. 
>>>
>>>2.    If SGE cannot find enough available nodes from those, it should
>>>then check grid_nodes49
>>>
>>>3.    Followed by grid_nodes48 (because grid_nodes49 belongs to fewer
>>>queues than grid_nodes48)
>>>
>>>
>>>
>>>FYI fatnodes.q only contains grid_nodes48-49 however grid_nodes48 is
>>>the only node in io.q.
>>>
>>>I generally use grid_nodes48 for special io operations.
>>>
>>>
>>>
>>>so the break down is :
>>>
>>>
>>>
>>>Nodes             Queues node belongs to
>>>
>>>------------------------------------------------
>>>
>>>
>>>
>>>grid_nodes06 - all.q, QueueA.q
>>>
>>>.
>>>
>>>.
>>>
>>>grid_nodes47 - all.q, QueueA.q
>>>
>>>grid_nodes48 - all.q, QueueA.q, fatnodes.q, io.q
>>>
>>>grid_nodes49 - all.q, QueueA.q, fatnodes.q
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>You can assign a sequence number for each queue instance and change the
>>scheduler
>>configuration to use the sequence number for selection queues instead of
>>the load value.
>>This way you can encode your table in the queue instances. However, this
>>is some work.
>>
>>Or you could ask the scheduler to do least_used_first / fill_up on the
>>different queues instances.
>>
>>Cheers,
>>Stephan
>>
>> 
>>
>>    
>>
>>>Cheers,
>>>
>>>William
>>>
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>
>>>*From:* William Burke [mailto:wburke999 at msn.com]
>>>*Sent:* Wednesday, March 30, 2005 10:50 AM
>>>*To:* users at gridengine.sunsource.net
>>>*Subject:* RE: [GE users] FW: Pick nodes from one queue plus 1 node
>>>      
>>>
>>>from another queue
>>    
>>
>>>
>>>Reuti
>>>
>>>
>>>
>>><snip>
>>>
>>>   
>>>
>>>      
>>>
>>>>I think, he means an allocation in PBS like:
>>>>     
>>>>
>>>>-l nodes=ionode:1+compute:5
>>>>     
>>>>
>>>>        
>>>>
>>>Yes this is the exact functionality that I need and this would ensure
>>>that the job would include 5 compute hosts and that one ionode in the
>>>$pe_hostfile. Then I could
>>>
>>>
>>>
>>>1.    direct the output of $pe_hostfile to a file that could be
>>>manipulated
>>>
>>>2.    in the startmpi.sh ensure that in the PEHostfiletoMachinefile
>>>conversion the 1 ionode node in $pe_hostfile becomes the last node in
>>>the Machinefile
>>>
>>>
>>>
>>>Does that make sense?
>>>
>>>
>>>
>>>Regards,
>>>
>>>William
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>Sent: Wednesday, March 30, 2005 9:26 AM
>>>To: users at gridengine.sunsource.net
>>>Subject: Re: [GE users] FW: Pick nodes from one queue plus 1 node from
>>>another queue
>>>
>>>
>>>
>>>Stephan,
>>>
>>>
>>>
>>>I was thinking of the same. But wouldn't this allow to get additional
>>>
>>>slots from the wrong queue as slaves?
>>>
>>>
>>>
>>>I think, he means an allocation in PBS like:
>>>
>>>
>>>
>>>-l nodes=ionode:1+compute:5
>>>
>>>
>>>
>>>to get 6 CPUs - one from the nodes with the feature ionode and 5 with
>>>
>>>the feature compute. - Reuti
>>>
>>>
>>>
>>>
>>>
>>>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>>>
>>>   
>>>
>>>      
>>>
>>>>William Burke wrote:
>>>>     
>>>>
>>>>        
>>>>
>>>>>Hi,
>>>>>       
>>>>>
>>>>><Snip>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>you don't need a special queue to set up for the FatQueue machine.
>>>>>>         
>>>>>>
>>>>>>You can submit with "-masterq QueueA.q at myhost" in qsub.
>>>>>>         
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>The thing is if I pick myhost to be masterq what happens if that host is
>>>>>       
>>>>>
>>>>>busy with another job and there are other host that can be picked.
>>>>>       
>>>>>
>>>>>The robustness that I need in SGE is for it to arbitrarily pick
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>those M-1
>>>
>>>   
>>>
>>>      
>>>
>>>>>nodes from the QueueA.q and the Mth one from FatQueueB.q. I do not see
>>>>>       
>>>>>
>>>>>how
>>>>>       
>>>>>
>>>>>the "-masterq QueueA.q at myhost" in qsub will achieve this. Help me to
>>>>>       
>>>>>
>>>>>understand your suggestion.
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>You can put the M-1 hosts in one cluster queue and the other hosts into
>>>>     
>>>>
>>>>another cluster queue.
>>>>     
>>>>
>>>>A simple approach would be to put all M-1 hosts into the M-1 hostgroup
>>>>     
>>>>
>>>>and all other hosts
>>>>     
>>>>
>>>>into a second hostgroup (others).
>>>>     
>>>>
>>>>You than define a cluster queue on the M-1 and other hostgroup.
>>>>     
>>>>
>>>>The qsub command would like like:
>>>>     
>>>>
>>>>qsub -pe .... -masterq "cluster_queue@@M-1" .....
>>>>     
>>>>
>>>>This ensures, that the master task is started on one of teh M-1 machines.
>>>>     
>>>>
>>>>Does it help?
>>>>     
>>>>
>>>>Stephan
>>>>     
>>>>
>>>>        
>>>>
>>>>>I think that PBS's qsub has a way to specify a queue and the number of
>>>>>       
>>>>>
>>>>>nodes
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>from that queue - Queue:Num_nodes Does SGE have this built in
>>>>        
>>>>
>>>>>       
>>>>>
>>>>>functionality?
>>>>>       
>>>>>
>>>>>William
>>>>>       
>>>>>
>>>>>-----Original Message-----
>>>>>       
>>>>>
>>>>>From: Reuti [mailto:reuti at staff.uni-marburg.de] Sent: Wednesday, March
>>>>>       
>>>>>
>>>>>30, 2005 6:23 AM
>>>>>       
>>>>>
>>>>>To: users at gridengine.sunsource.net
>>>>>       
>>>>>
>>>>>Subject: Re: [GE users] FW: Pick nodes from one queue plus 1 node from
>>>>>       
>>>>>
>>>>>another queue
>>>>>       
>>>>>
>>>>>Hi,
>>>>>       
>>>>>
>>>>>you don't need a special queue to set up for the FatQueue machine. You
>>>>>       
>>>>>
>>>>>can submit with "-masterq QueueA.q at myhost" in qsub.
>>>>>       
>>>>>
>>>>>Small problem: SGE may select another slot from this machine, unless
>>>>>       
>>>>>
>>>>>you choose an allocation rule of 1. Then you can be sure, one slot
>>>>>       
>>>>>
>>>>>(the special) one on the extra machine (so you may give this machine
>>>>>       
>>>>>
>>>>>more slots than the other machines). The other slots will be on other
>>>>>       
>>>>>
>>>>>machines this way. But as this can only be done for the head node of
>>>>>       
>>>>>
>>>>>the parallel job, maybe you have to reorder any operation in your
>>>>>       
>>>>>
>>>>>script, as you requested it to be the last machine.
>>>>>       
>>>>>
>>>>>Cheers - Reuti
>>>>>       
>>>>>
>>>>>William Burke wrote:
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Ultimately I would like to submit a parallel job that uses N-1 nodes
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>from QueueA.q and 1 node from FatQueueB.q as long as the node from
>>>>>          
>>>>>
>>>>>>         
>>>>>>
>>>>>>FatQueueB..q is the last node on the machinefile list
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>------------------------------------------------------------------------
>>>
>>>   
>>>
>>>      
>>>
>>>>>>From: William Burke [mailto:wburke999 at msn.com]
>>>>>>         
>>>>>>
>>>>>>Sent: Wednesday, March 30, 2005 1:23 AM
>>>>>>         
>>>>>>
>>>>>>To: users at gridengine.sunsource.net
>>>>>>         
>>>>>>
>>>>>>Subject: RE: Pick nodes from one queue plus 1 node from another queue
>>>>>>         
>>>>>>
>>>>>>There are M nodes in machine list and lets say that I want to submit
>>>>>>         
>>>>>>
>>>>>>a job that can explicitly pick an exact amount of nodes from one
>>>>>>         
>>>>>>
>>>>>>particular queue and  only one
>>>>>>         
>>>>>>
>>>>>>node from another queue which equals the total # of nodes found in
>>>>>>         
>>>>>>
>>>>>>the $pe_hostfile.
>>>>>>         
>>>>>>
>>>>>>So for instance:
>>>>>>         
>>>>>>
>>>>>>The user launches a parallel job that requests 33 processors. If two
>>>>>>         
>>>>>>
>>>>>>queues exist, QueueA.q (consisting of 45 nodes) and FatQueueB.q
>>>>>>         
>>>>>>
>>>>>>(consisting of 2 nodes from QueueA.q's nodes) the user wants the
>>>>>>         
>>>>>>
>>>>>>ability to specify 32 processors from QueueA.q and only 1 processor
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>from FatQueueB.q, what is the best way to implement that?
>>>>>          
>>>>>
>>>>>>         
>>>>>>
>>>>>>This is the situation:
>>>>>>         
>>>>>>
>>>>>>1.    The particular application needs N processors for a job
>>>>>>         
>>>>>>
>>>>>>2.    I request this in -pe mpich N parameter
>>>>>>         
>>>>>>
>>>>>>3.    SGE generates M machines in its $pe_hostfile list based on the
>>>>>>         
>>>>>>
>>>>>>Nth processor
>>>>>>         
>>>>>>
>>>>>>4.    As we already know the algorithm that creates $pe_hostfile says
>>>>>>         
>>>>>>
>>>>>>create M nodes {if N is an even number then the Mth node should be
>>>>>>         
>>>>>>
>>>>>>N/2 else the Mth node should be (Nth+1)/2}
>>>>>>         
>>>>>>
>>>>>>a.    I need some way to tell SGE that the Mth (or last) node of the
>>>>>>         
>>>>>>
>>>>>>Machinefile list always has to be a node from the FatQueue.q which I
>>>>>>         
>>>>>>
>>>>>>use those type of nodes for heavy io processing of the job.
>>>>>>         
>>>>>>
>>>>>>b.    I do not want a job to run unless the Mth (or last) node in the
>>>>>>         
>>>>>>
>>>>>>Machinefile is a node from FatQueue.q otherwise the job should wait
>>>>>>         
>>>>>>
>>>>>>until that request is filled.
>>>>>>         
>>>>>>
>>>>>>5.    Ultimately the correctly formatted mpirun machinefile gets
>>>>>>         
>>>>>>
>>>>>>created from the final $pe_hostfile of M nodes.
>>>>>>         
>>>>>>
>>>>>>FWIW, usually the amount of processors is odd.
>>>>>>         
>>>>>>
>>>>>>What is very important is that the last node of the mpirun
>>>>>>         
>>>>>>
>>>>>>machinefile list is always from the FatQueueB.q.
>>>>>>         
>>>>>>
>>>>>>Regards,
>>>>>>         
>>>>>>
>>>>>>William
>>>>>>         
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>       
>>>>>
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>       
>>>>>
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>       
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>       
>>>>>
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>       
>>>>>
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>     
>>>>
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>     
>>>>
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>     
>>>>
>>>>        
>>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> 
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list