[GE users] mpi jobs and forced requestable resources

eddale eddale at cs.unc.edu
Thu Aug 6 18:43:53 BST 2009


We have something similar.  The way we've solved it is by attaching the 
largeNode complex to a queue instead of to the exechosts.  So, we have 3 
queues:

all.q - Contains all nodes, no complex value
comp.q - Contains all normal nodes, no complex value
himem.q - Contains himem nodes, largeNode=TRUE

 From your list of outcomes, #1 would be dispatched to all.q and #2 and 
#3 would fail because there aren't enough largeNode slots.

Edward

craffi said the following on 8/6/09 1:29 PM:
> Hi folks,
> 
> I've got a 1024-core cluster with 960 processors in "normal" servers  
> and an additional 64 processors in special "large memory" server nodes.
> 
> We made the large memory nodes usable only if the user requests a  
> special forced resource "qsub -l largeNode=true ..." so that they are  
> not used for every day run of the mill jobs. This works fine for  
> normal usage.
> 
> However I'm having trouble with that forced complex when trying to run  
> a parallel task across all 1024 cores ...
> 
> Outcomes:
> 
> (1) "qsub -pe openmpi 1024": Job pends forever because there are "only  
> 960 slots"
> 
> (2) "qsub -l largeNode=true -pe openmpi 1024": Job pends forever  
> because "there are only 64 slots"
> 
> (3) "qsub -soft -l largeNode=true -pe openmpi 1024": Same error --  
> only 64 slots are available with the forced resource request applied
> 
> 
> Right now it looks like the forced requestable attached to the  
> largeNodes means that I can't have a MPI job span regular and "large"  
> systems.
> 
> This limits me to 960 cores if I stick to the "normal" nodes and 64- 
> cores if I request the largeNode resources.
> 
> I had hoped that making a soft resource request for the "forced"  
> resource would let me run a 1024-way MPI job but it does not seem to  
> work.
> 
> Looking for confirmation that I'm stuck in the current config or a  
> clue on how to properly make the qsub call so that my job can span  
> nodes with forced and non-forced requestable attached to them.
> 
> -Chirs
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=211227
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=211229

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list