[GE users] two network cards per node

Reuti reuti at staff.uni-marburg.de
Fri Aug 10 20:51:57 BST 2007


Hi,

Am 10.08.2007 um 18:22 schrieb John Hearns:

> On Fri, 2007-08-10 at 10:13 -0600, Nicolas Bock wrote:
>>
>> we have two network cards installed per compute node (which all have
>> two CPUs). We would like to make sure that the grid engine utilizes
>> both of them for MPI traffic. We are unclear however how to exactly
>> achieve this. We put the two network cards on separate subnets due to
>> hardware restrictions. When we start sge_execd it binds to the first
>> IP address and does not appear to use the second one.
>
> Gridengine will give you a PE_HOSTFILE which has hostnames (and
> therefore IP Addresses) of the primary network interface.
> You'll need a custom PE start script which will take this file and
> change the names to the names (or IP addresses) of your secondary
> interfaces and feed this as a machines file to your mpirun command.

(un)fortunately SGE will take care, that the qrsh request is coming  
from a well known node (i.e. interface). So faking the PE_HOSTFILE  
will not work for a Tight Integration. There are first two options:

a) Route the NFS traffic to the second interface. The SGE  
communication will then be shared with the MPI interface. If you  
setup SGE like: http://gridengine.sunsource.net/howto/nfsreduce.html  
it will even lower the load on the second (NFS) interface. Some MPI  
implementations can't deal with a second interface, and require to  
use the primary one. AFAIR it was with MPICH2, where it was only  
supported under Windows to specify to use the second one.

b) Depending on the the MPI you choose, with MPICH1 you could do it  
the other way round: use the primary interface for NFS, setup SGE to  
use the second one as its interface http://gridengine.sunsource.net/ 
howto/multi_intrfcs.html, and set the name for the back-channel of  
MPICH1 http://gridengine.sunsource.net/howto/mpich-integration.html  
by setting MPI_HOST as outlined there.


Second option if you have heavy MPI communication: maybe it would be  
worth to look into channel bonding in your cluster, so that they  
appear as one interface to the applications (and SGE).

-- Reuti

>
>
>>  Can we tell it to bind one slot to each network card?
>
> I <it> suppose </it> you could have two PEs - one which does the above
> address munging, one which does not, the use PEs with round_robin
>
> As usual, someone with a clue will be along in a minute.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list