[GE users] Machine file and dedicated IB network
igardais at yahoo.fr
Wed May 19 14:56:40 BST 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Thanks for the answer.
I did that but it not helped.
I'm still having a (non-SGE-related) issue about DAPL.
Here are one of the line :
 MPI startup(): DAPL provider <NULL on rank 0:mania-3.beicip.fr differs from <NULL string> on rank 1:mania-3.beicip.fr
It refers to 'mania-X.beicip.fr' which is the hostname of the ethernet interface.
The IB interfaces are named 'mania-X-ib.cluster'.
The machine file correctly reports hostname 'mania-X-ib.cluster' and I've forced the I_MPI_DEVICE to "rdma:ib0" to use the ib0 interface.
I've read that RHEL 5.4 has a buggy DAPL layer. I'll see with the MLNX-OFED-1.5.1 release for RHEL 5.4.
Hope this will help.
De : reuti <reuti at staff.uni-marburg.de>
? : users at gridengine.sunsource.net
Envoyé le : Mer 19 mai 2010, 13h 27min 17s
Objet : Re: [GE users] Machine file and dedicated IB network
Am 19.05.2010 um 12:51 schrieb igardais:
> Until now, we ran SGE only over Ethernet.
> All was fine because the same single network was used for SGE and data.
> We received a new cluster with InfiniBand and the setup we made is :
> - ethernet network (172.31.0.0/16) for datas and SGE
> - IB network for MPI
> IPoIB is setup with a dedicated, non-routed network (192.168.0.0/24).
> The machine file generated by the PE contains "ethernet name" of the machines.
> This machine file is use by MPI (IntelMPI) to start the mpi ring.
> Shouldn't the hostnames in the machine file be the "infiniband name" (host-ib) ?
> I've read the "multiple interface howto" but I not sure it applies to this case.
as SGE is still running across the ethernet, you can just map the hostnames like outlined in the $SGE_ROOT/mpi/startmpi.sh where it's mapped to ATM hostnames. Depending on your applications, it can be necessary to supply -hostname to the startmpi.sh script, so that hostnames are mapped in general (also from the applications point of view).
> Any advice is welcome,
> Thanks and regards,
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].
More information about the gridengine-users