[GE users] The distribution of cores

reuti reuti at staff.uni-marburg.de
Tue Jul 20 12:23:48 BST 2010


Hi,

Am 20.07.2010 um 10:19 schrieb gqc606:

>  I installed MPICH2 and SGE on my computers,but there is a problem that I need to solve. Such as, when I submitted a job which need 20-core,SGE always distributed the 20-core to all the computer nodes.As the following:
> 
> all.q at compute-0-0.local        BIP   0/3/4          2.23     lx26-amd64    
>      5 0.55500 smooth     test         r     07/19/2010 21:15:34     3        
> ---------------------------------------------------------------------------------
> all.q at compute-0-1.local        BIP   0/3/4          2.09     lx26-amd64    
>      5 0.55500 smooth     test         r     07/19/2010 21:15:34     3        
> ---------------------------------------------------------------------------------
> all.q at compute-0-2.local        BIP   0/3/4          2.15     lx26-amd64    
>      5 0.55500 smooth     test         r     07/19/2010 21:15:34     3        
> ---------------------------------------------------------------------------------
> all.q at compute-0-3.local        BIP   0/4/4          3.36     lx26-amd64    
>      5 0.55500 smooth     test         r     07/19/2010 21:15:34     4        
> ---------------------------------------------------------------------------------
> all.q at compute-0-4.local        BIP   0/4/4          2.99     lx26-amd64    
>      5 0.55500 smooth     test         r     07/19/2010 21:15:34     4        
> ---------------------------------------------------------------------------------
> all.q at compute-0-5.local        BIP   0/3/4          2.08     lx26-amd64    
>      5 0.55500 smooth     test         r     07/19/2010 21:15:34     3        
> 
>  As the communications between computers take a long time;I want SGE assigned this job to the nodes which is not working; then distribute the cores to the working nodes. what should I do to solve this problem?Thanks!

there is nothing foreseen to implement a sophisticated distribution algorithm for parallel jobs. In former times, with one or two physical single core CPUs this was not a big issue. And before anything like was implemented, we are now reaching also in the standard PC area a time, where you can have easily 12 cores per physical CPU, hence all could be done local by just two CPUs in your case.

What you could try to minimize the effect of slot distribution: define an "allocation_rule 4" for your PE. This way, you are of course limited to get for this PE only multiple of 4 as valid slost requests and get always complete machines for your job.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=269224

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list