FW: [GE users] FW: Pick nodes from one queue plus 1 node from another queue

Craig Tierney ctierney at hpti.com
Mon Apr 4 16:56:20 BST 2005


On Mon, 2005-04-04 at 08:41, William Burke wrote:
> Hi Stephan,
> 
> > Craigs' approach limits the hosts to a the ones usablef for your job.
> 
> I am a little unclear to what you mean ' a the ones usablef'?
> 

I think what he means is that when I designate a node as
an IO node, it can never be used by another process.  
Using some of the methods that Stephan described an IO
node would tend to be used as an IO node, but if the system
was filled other nodes could get it as well.

For us, only a few applications need an IO node and
we are happy leaving it idle so those nodes can
get it when they want.  If/When SGE supports advanced
reservations we can probably be more flexible.

Craig



> William
> 
> -----Original Message-----
> From: Stephan Grell - Sun Germany - SSG - Software Engineer
> [mailto:stephan.grell at sun.com] 
> Sent: Monday, April 04, 2005 9:29 AM
> To: William Burke
> Cc: users at gridengine.sunsource.net
> Subject: Re: FW: [GE users] FW: Pick nodes from one queue plus 1 node from
> another queue
> 
> 
> 
> William Burke wrote:
> 
> >Hi Stephan,
> >
> >  
> >
> >>From: Stephan Grell - Sun Germany - SSG - Software Engineer
> >>    
> >>
> >[stephan.grell at sun.com] 
> >  
> >
> >>You can assign a sequence number for each queue instance and change the 
> >>scheduler configuration to use the sequence number for selection queues 
> >>instead of the load value.
> >>This way you can encode your table in the queue instances. However, this 
> >>is some work.
> >>
> >>Or you could ask the scheduler to do least_used_first / fill_up on the 
> >>different queues instances.
> >>    
> >>
> >
> >It's a little unclear how this may work can you elaborate?  Does your
> >solution work in conjunction with Craig Tierney solution? SEE BELOW
> >
> Craigs' approach limits the hosts to a the ones usablef for your job. After
> the filtering step, you can tell the scheduler, how to sort the queue
> instances.
> 
> The "least_used_first" and "fill_up" configuration is not straight
> forward. One
> need to configure slots on host level, "qconf -me <host>" 
> "complex_values        slots=<NR>".
> The next step is the use of slots as a load value.
> "qconf -msconf"
>     "queue_sort_method                 load"
>     "load_formula                      [+-]slots"
> 
>   slots : is_least_used_first
> -slots : is fill_up
> 
> Does this help?
> 
> Stephan
> 
> >
> >I asked:
> >  
> >
> >>>Since I am not aware of SGE possessing this PBS functionality to 
> >>>explicitly pick an exact amount of nodes from one queue and another 
> >>>queue, does anyone have that function implemented in there SGE 
> >>>environment?
> >>>
> >>>      
> >>>
> >
> >Craig suggested:
> >* SGE does not support this function.  It is something we used to do with 
> >* PBS and needed to find a solution.  How many ionodes do you have?  Our 
> >* solution wasn't great, but since our IO node count is less than 10 the 
> >* following works.
> >*
> >* 1) Create a host group with the nodes in the compute pool.
> >* 2) For each IO node, create a host group that includes that
> >* 1 ionode and the compute pool host group.  You can specify this when you 
> >* use "qconf -ahgrp" by listing the nodes as "ionodeN @compute".  That way,
> 
> >* when the compute nodes change, making a change to @compute changes all of
> 
> >* the groups.
> >* 3) Create a parallel environment and cluster queue for each IO node.
> >*
> >* Lets say that each parallel environment for each IO node is called
> peioNN,
> >
> >* where NN is an integer.  Also, each IO node is named ioMM, where MM is an
> >* integer.  The numbering convention doesn't have to be consistent.
> >*
> >* For a user to submit a job, where there are 3 IO nodes, it would look
> >* like:
> >*
> >* qsub -pe 'peio*' 16 -masterq \*@io1,\*io2,\*io3 myjob.csh
> >*
> >* This will do what you want.  It isn't very clean though.  
> >* It will be difficult to let users know of changes and to ensure they do
> it
> >* right.
> >*
> >* You can write a wrapper script to modify the users script and qsub line
> to
> >
> >* do this for them.  For us, the user just submits a job to the virtual pe 
> >* 'io'.  If we see that, we remap the options to qsub to look like the 
> >* syntax above.  If the command has to change, it exists 1 one place on the
> 
> >* shared filesystem.
> >
> >
> >Regards, 
> >William
> >
> >
> >-----Original Message-----
> >From: Stephan Grell - Sun Germany - SSG - Software Engineer
> >[mailto:stephan.grell at sun.com] 
> >Sent: Monday, April 04, 2005 3:15 AM
> >To: users at gridengine.sunsource.net
> >Subject: Re: FW: [GE users] FW: Pick nodes from one queue plus 1 node from
> >another queue
> >
> >
> >
> >William Burke wrote:
> >
> >  
> >
> >>Hi,
> >>
> >> 
> >>
> >>I am back and would like to know if anyone has a clean way in SGE to
> >>control job submissions among queues?
> >>
> >> 
> >>
> >>Quoting Reuti
> >>
> >> 
> >>
> >><snip>
> >>
> >>    
> >>
> >>>I think, he means an allocation in PBS like:
> >>>      
> >>>
> >>>-l nodes=ionode:1+compute:5
> >>>      
> >>>
> >> 
> >>
> >>1)
> >>
> >>Since I am not aware of SGE possessing this PBS functionality to
> >>explicitly pick an exact amount of nodes from one queue and another
> >>queue, does anyone have that function implemented in there SGE
> >>environment?
> >>
> >> 
> >>
> >>2)
> >>
> >>When a user specifies a particular queue:
> >>
> >> 
> >>
> >>I would like SGE to first pick all available nodes in that queue which
> >>_belong to the least number of additional queues_, how could I
> >>implement this functionality in SGE_?_ Satisfying this requirement
> >>would avoid using a node unnecessarily that has additional specialized
> >>attributes such as nodes that are heavily used for io operations?
> >>
> >> 
> >>
> >>For example, if 32 cpus are requested from the QueueA.q, I would like
> >>SGE to
> >>
> >> 
> >>
> >>SEE BELOW
> >>
> >> 
> >>
> >>1.    First check nodes that belong to the least amount of queues
> >>(all.q and QueueA.q) which happen to be grid_nodes06-47. 
> >>
> >>2.    If SGE cannot find enough available nodes from those, it should
> >>then check grid_nodes49
> >>
> >>3.    Followed by grid_nodes48 (because grid_nodes49 belongs to fewer
> >>queues than grid_nodes48)
> >>
> >> 
> >>
> >>FYI fatnodes.q only contains grid_nodes48-49 however grid_nodes48 is
> >>the only node in io.q.
> >>
> >>I generally use grid_nodes48 for special io operations.
> >>
> >> 
> >>
> >>so the break down is :
> >>
> >> 
> >>
> >>Nodes             Queues node belongs to
> >>
> >>------------------------------------------------
> >>
> >> 
> >>
> >>grid_nodes06 - all.q, QueueA.q
> >>
> >>.
> >>
> >>.
> >>
> >>grid_nodes47 - all.q, QueueA.q
> >>
> >>grid_nodes48 - all.q, QueueA.q, fatnodes.q, io.q
> >>
> >>grid_nodes49 - all.q, QueueA.q, fatnodes.q
> >>
> >> 
> >>
> >>    
> >>
> >You can assign a sequence number for each queue instance and change the
> >scheduler
> >configuration to use the sequence number for selection queues instead of
> >the load value.
> >This way you can encode your table in the queue instances. However, this
> >is some work.
> >
> >Or you could ask the scheduler to do least_used_first / fill_up on the
> >different queues instances.
> >
> >Cheers,
> >Stephan
> >
> >  
> >
> >>Cheers,
> >>
> >>William
> >>
> >> 
> >>
> >> 
> >>
> >>------------------------------------------------------------------------
> >>
> >>*From:* William Burke [mailto:wburke999 at msn.com]
> >>*Sent:* Wednesday, March 30, 2005 10:50 AM
> >>*To:* users at gridengine.sunsource.net
> >>*Subject:* RE: [GE users] FW: Pick nodes from one queue plus 1 node
> >>from another queue
> >>
> >> 
> >>
> >>Reuti
> >>
> >> 
> >>
> >><snip>
> >>
> >>    
> >>
> >>>I think, he means an allocation in PBS like:
> >>>      
> >>>
> >>>-l nodes=ionode:1+compute:5
> >>>      
> >>>
> >> 
> >>
> >>Yes this is the exact functionality that I need and this would ensure
> >>that the job would include 5 compute hosts and that one ionode in the
> >>$pe_hostfile. Then I could
> >>
> >> 
> >>
> >>1.    direct the output of $pe_hostfile to a file that could be
> >>manipulated
> >>
> >>2.    in the startmpi.sh ensure that in the PEHostfiletoMachinefile
> >>conversion the 1 ionode node in $pe_hostfile becomes the last node in
> >>the Machinefile
> >>
> >> 
> >>
> >>Does that make sense?
> >>
> >> 
> >>
> >>Regards,
> >>
> >>William
> >>
> >> 
> >>
> >>-----Original Message-----
> >>From: Reuti [mailto:reuti at staff.uni-marburg.de]
> >>Sent: Wednesday, March 30, 2005 9:26 AM
> >>To: users at gridengine.sunsource.net
> >>Subject: Re: [GE users] FW: Pick nodes from one queue plus 1 node from
> >>another queue
> >>
> >> 
> >>
> >>Stephan,
> >>
> >> 
> >>
> >>I was thinking of the same. But wouldn't this allow to get additional
> >>
> >>slots from the wrong queue as slaves?
> >>
> >> 
> >>
> >>I think, he means an allocation in PBS like:
> >>
> >> 
> >>
> >>-l nodes=ionode:1+compute:5
> >>
> >> 
> >>
> >>to get 6 CPUs - one from the nodes with the feature ionode and 5 with
> >>
> >>the feature compute. - Reuti
> >>
> >> 
> >>
> >> 
> >>
> >>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> >>
> >>    
> >>
> >>>William Burke wrote:
> >>>      
> >>>
> >>>>Hi,
> >>>>        
> >>>>
> >>>><Snip>
> >>>>        
> >>>>
> >>>>>you don't need a special queue to set up for the FatQueue machine.
> >>>>>          
> >>>>>
> >>>>>You can submit with "-masterq QueueA.q at myhost" in qsub.
> >>>>>          
> >>>>>
> >>>>> 
> >>>>>          
> >>>>>
> >>>>The thing is if I pick myhost to be masterq what happens if that host is
> >>>>        
> >>>>
> >>>>busy with another job and there are other host that can be picked.
> >>>>        
> >>>>
> >>>>The robustness that I need in SGE is for it to arbitrarily pick
> >>>>        
> >>>>
> >>those M-1
> >>
> >>    
> >>
> >>>>nodes from the QueueA.q and the Mth one from FatQueueB.q. I do not see
> >>>>        
> >>>>
> >>>>how
> >>>>        
> >>>>
> >>>>the "-masterq QueueA.q at myhost" in qsub will achieve this. Help me to
> >>>>        
> >>>>
> >>>>understand your suggestion.
> >>>>        
> >>>>
> >>>You can put the M-1 hosts in one cluster queue and the other hosts into
> >>>      
> >>>
> >>>another cluster queue.
> >>>      
> >>>
> >>>A simple approach would be to put all M-1 hosts into the M-1 hostgroup
> >>>      
> >>>
> >>>and all other hosts
> >>>      
> >>>
> >>>into a second hostgroup (others).
> >>>      
> >>>
> >>>You than define a cluster queue on the M-1 and other hostgroup.
> >>>      
> >>>
> >>>The qsub command would like like:
> >>>      
> >>>
> >>>qsub -pe .... -masterq "cluster_queue@@M-1" .....
> >>>      
> >>>
> >>>This ensures, that the master task is started on one of teh M-1 machines.
> >>>      
> >>>
> >>>Does it help?
> >>>      
> >>>
> >>>Stephan
> >>>      
> >>>
> >>>>I think that PBS's qsub has a way to specify a queue and the number of
> >>>>        
> >>>>
> >>>>nodes
> >>>>        
> >>>>
> >>>>from that queue - Queue:Num_nodes Does SGE have this built in
> >>>>        
> >>>>
> >>>>functionality?
> >>>>        
> >>>>
> >>>>William
> >>>>        
> >>>>
> >>>>-----Original Message-----
> >>>>        
> >>>>
> >>>>From: Reuti [mailto:reuti at staff.uni-marburg.de] Sent: Wednesday, March
> >>>>        
> >>>>
> >>>>30, 2005 6:23 AM
> >>>>        
> >>>>
> >>>>To: users at gridengine.sunsource.net
> >>>>        
> >>>>
> >>>>Subject: Re: [GE users] FW: Pick nodes from one queue plus 1 node from
> >>>>        
> >>>>
> >>>>another queue
> >>>>        
> >>>>
> >>>>Hi,
> >>>>        
> >>>>
> >>>>you don't need a special queue to set up for the FatQueue machine. You
> >>>>        
> >>>>
> >>>>can submit with "-masterq QueueA.q at myhost" in qsub.
> >>>>        
> >>>>
> >>>>Small problem: SGE may select another slot from this machine, unless
> >>>>        
> >>>>
> >>>>you choose an allocation rule of 1. Then you can be sure, one slot
> >>>>        
> >>>>
> >>>>(the special) one on the extra machine (so you may give this machine
> >>>>        
> >>>>
> >>>>more slots than the other machines). The other slots will be on other
> >>>>        
> >>>>
> >>>>machines this way. But as this can only be done for the head node of
> >>>>        
> >>>>
> >>>>the parallel job, maybe you have to reorder any operation in your
> >>>>        
> >>>>
> >>>>script, as you requested it to be the last machine.
> >>>>        
> >>>>
> >>>>Cheers - Reuti
> >>>>        
> >>>>
> >>>>William Burke wrote:
> >>>>        
> >>>>
> >>>>>Ultimately I would like to submit a parallel job that uses N-1 nodes
> >>>>>          
> >>>>>
> >>>>>from QueueA.q and 1 node from FatQueueB.q as long as the node from
> >>>>>          
> >>>>>
> >>>>>FatQueueB..q is the last node on the machinefile list
> >>>>>          
> >>>>>
> >>------------------------------------------------------------------------
> >>
> >>    
> >>
> >>>>>From: William Burke [mailto:wburke999 at msn.com]
> >>>>>          
> >>>>>
> >>>>>Sent: Wednesday, March 30, 2005 1:23 AM
> >>>>>          
> >>>>>
> >>>>>To: users at gridengine.sunsource.net
> >>>>>          
> >>>>>
> >>>>>Subject: RE: Pick nodes from one queue plus 1 node from another queue
> >>>>>          
> >>>>>
> >>>>>There are M nodes in machine list and lets say that I want to submit
> >>>>>          
> >>>>>
> >>>>>a job that can explicitly pick an exact amount of nodes from one
> >>>>>          
> >>>>>
> >>>>>particular queue and  only one
> >>>>>          
> >>>>>
> >>>>>node from another queue which equals the total # of nodes found in
> >>>>>          
> >>>>>
> >>>>>the $pe_hostfile.
> >>>>>          
> >>>>>
> >>>>>So for instance:
> >>>>>          
> >>>>>
> >>>>>The user launches a parallel job that requests 33 processors. If two
> >>>>>          
> >>>>>
> >>>>>queues exist, QueueA.q (consisting of 45 nodes) and FatQueueB.q
> >>>>>          
> >>>>>
> >>>>>(consisting of 2 nodes from QueueA.q's nodes) the user wants the
> >>>>>          
> >>>>>
> >>>>>ability to specify 32 processors from QueueA.q and only 1 processor
> >>>>>          
> >>>>>
> >>>>>from FatQueueB.q, what is the best way to implement that?
> >>>>>          
> >>>>>
> >>>>>This is the situation:
> >>>>>          
> >>>>>
> >>>>>1.    The particular application needs N processors for a job
> >>>>>          
> >>>>>
> >>>>>2.    I request this in -pe mpich N parameter
> >>>>>          
> >>>>>
> >>>>>3.    SGE generates M machines in its $pe_hostfile list based on the
> >>>>>          
> >>>>>
> >>>>>Nth processor
> >>>>>          
> >>>>>
> >>>>>4.    As we already know the algorithm that creates $pe_hostfile says
> >>>>>          
> >>>>>
> >>>>>create M nodes {if N is an even number then the Mth node should be
> >>>>>          
> >>>>>
> >>>>>N/2 else the Mth node should be (Nth+1)/2}
> >>>>>          
> >>>>>
> >>>>>a.    I need some way to tell SGE that the Mth (or last) node of the
> >>>>>          
> >>>>>
> >>>>>Machinefile list always has to be a node from the FatQueue.q which I
> >>>>>          
> >>>>>
> >>>>>use those type of nodes for heavy io processing of the job.
> >>>>>          
> >>>>>
> >>>>>b.    I do not want a job to run unless the Mth (or last) node in the
> >>>>>          
> >>>>>
> >>>>>Machinefile is a node from FatQueue.q otherwise the job should wait
> >>>>>          
> >>>>>
> >>>>>until that request is filled.
> >>>>>          
> >>>>>
> >>>>>5.    Ultimately the correctly formatted mpirun machinefile gets
> >>>>>          
> >>>>>
> >>>>>created from the final $pe_hostfile of M nodes.
> >>>>>          
> >>>>>
> >>>>>FWIW, usually the amount of processors is odd.
> >>>>>          
> >>>>>
> >>>>>What is very important is that the last node of the mpirun
> >>>>>          
> >>>>>
> >>>>>machinefile list is always from the FatQueueB.q.
> >>>>>          
> >>>>>
> >>>>>Regards,
> >>>>>          
> >>>>>
> >>>>>William
> >>>>>          
> >>>>>
> >>>>> 
> >>>>>          
> >>>>>
> >>>>---------------------------------------------------------------------
> >>>>        
> >>>>
> >>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>        
> >>>>
> >>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>        
> >>>>
> >>>>---------------------------------------------------------------------
> >>>>        
> >>>>
> >>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>        
> >>>>
> >>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>        
> >>>>
> >>>---------------------------------------------------------------------
> >>>      
> >>>
> >>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>      
> >>>
> >>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>      
> >>>
> >> 
> >>
> >> 
> >>
> >>---------------------------------------------------------------------
> >>
> >>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>
> >>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >> 
> >>
> >>    
> >>
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list