[GE users] SGE configuration for multi-core multi-processor nodes cluster

reuti reuti at staff.uni-marburg.de
Tue Jan 26 00:12:32 GMT 2010


Am 25.01.2010 um 16:04 schrieb letigre:

> Hi all,
> I am pretty new here so please forgive me if my question sounds too  
> basic but ur help would anyway be much appreciated.
> I have been submitting parallel jobs to a new cluster which  
> comprises 44 nodes, each made of 2 quadri-core processors, i.e. 8  
> processors per node. SGE configuration has been set up very  
> basically on the cluster and in the current state, SGE always  
> distributes a N processors job on N nodes (if N  is less than 44).  
> This is not very efficient for large jobs and not very predictable  
> in terms of distribution as soon as N > 44. What we are looking for  
> is the following configuration:
> (a) serial and parallel jobs up to 7 processors are pile up on  
> already partly occupied nodes if available.
> Example: assume that all nodes are available  except Node 1 on  
> which 3 processors are used, if I submit a 4 processors job, I  
> would like all processes to run on Node 1, in the current  
> configuration I would have the following: process 1 on Node 2,  
> process 2 on Node 3, process 3 on Node 4 and process 4 on Node 5

if you want to have it really one node after ther other, you need to  
set up sequence numbers for each queue instance in ascending order  
and change in the scheduler configuration "queue_sort_method    
seqno". One option could be to fill the cluster with serial jobs from  
the one side, and with parallel job from the other side. Hence it  
would include one queue serial.q like mentioned above, and parallel.q  
having the sequence numbers of the queue instances in the reverse order.

To avoid oversubscribing the slots per node must be limited either by  
a complex in each exechost definition "complex_values slots=8" or an  
RQS with a rule "limit hosts {*} to slots=8" as you have now two  
queues per node with 8 slots.

> (b) big jobs submitted on M processors, where M is a multiple of 8,  
> are distributed on M/8 nodes, if available, which implies that my  
> job occupies all processors of the nodes on which it runs.

For this you need a PE with a fixed "allocation_rule 8". But then you  
are limited to exactly this allocation. Although you can use  
wildcards to select one of the available PEs (maybe a PE called mpi8  
and another mpi4 with a fixed "allocation_rule 4"), there is nothing  
like a sequence number for PE (i.e.: which PE should be tried first).

> Option (a) would contribute to have nodes where all processors are  
> available such that option (b) is feasible.
> Ultimately, it would be nice, somehow, to be able to set the number  
> of processors used on each node by a job.
> Example: I submit a 32 processors job and I would like it to run on  
> 8 nodes with 4 processors per node.

This is only possible by preparing some PEs with the intended  
allocation rule. It's not user selectable at qsub time otherwise  
besides requesting a PE mpi4 in this case.

-- Reuti

> Is there a way to configure the SGE set up to do that all or  
> partly ? Is this a part of SGE set up and should I specify it as I  
> submit  jobs with "qsub" ?
> I am pretty sure this might sound like basic configuration to many  
> of you but any advice would be very helpful.
> Thanks in advance,
> Anthony
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=240902
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list