[GE users] mpi jobs and forced requestable resources

craffi dag at sonsorol.org
Thu Aug 6 18:29:32 BST 2009

Hi folks,

I've got a 1024-core cluster with 960 processors in "normal" servers  
and an additional 64 processors in special "large memory" server nodes.

We made the large memory nodes usable only if the user requests a  
special forced resource "qsub -l largeNode=true ..." so that they are  
not used for every day run of the mill jobs. This works fine for  
normal usage.

However I'm having trouble with that forced complex when trying to run  
a parallel task across all 1024 cores ...


(1) "qsub -pe openmpi 1024": Job pends forever because there are "only  
960 slots"

(2) "qsub -l largeNode=true -pe openmpi 1024": Job pends forever  
because "there are only 64 slots"

(3) "qsub -soft -l largeNode=true -pe openmpi 1024": Same error --  
only 64 slots are available with the forced resource request applied

Right now it looks like the forced requestable attached to the  
largeNodes means that I can't have a MPI job span regular and "large"  

This limits me to 960 cores if I stick to the "normal" nodes and 64- 
cores if I request the largeNode resources.

I had hoped that making a soft resource request for the "forced"  
resource would let me run a 1024-way MPI job but it does not seem to  

Looking for confirmation that I'm stuck in the current config or a  
clue on how to properly make the qsub call so that my job can span  
nodes with forced and non-forced requestable attached to them.



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list