[GE users] Machine file and dedicated IB network

igardais igardais at yahoo.fr
Wed May 19 14:56:40 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti,

Thanks for the answer.
I did that but it not helped.

I'm still having a (non-SGE-related) issue about DAPL.
Here are one of the line :
[1] MPI startup(): DAPL provider <NULL on rank 0:mania-3.beicip.fr differs from <NULL string> on rank 1:mania-3.beicip.fr

It refers to 'mania-X.beicip.fr' which is the hostname of the ethernet interface.
The IB interfaces are named 'mania-X-ib.cluster'.

The machine file correctly reports hostname 'mania-X-ib.cluster' and I've forced the I_MPI_DEVICE to "rdma:ib0" to use the ib0 interface.

I've read that RHEL 5.4 has a buggy DAPL layer. I'll see with the MLNX-OFED-1.5.1 release for RHEL 5.4.
Hope this will help.

Ionel


________________________________
De : reuti <reuti at staff.uni-marburg.de>
? : users at gridengine.sunsource.net
Envoyé le : Mer 19 mai 2010, 13h 27min 17s
Objet : Re: [GE users] Machine file and dedicated IB network

Hi,

Am 19.05.2010 um 12:51 schrieb igardais:

> Until now, we ran SGE only over Ethernet.
> All was fine because the same single network was used for SGE and data.
>
> We received a new cluster with InfiniBand and the setup we made is :
> - ethernet network (172.31.0.0/16) for datas and SGE
> - IB network for MPI
>
> IPoIB is setup with a dedicated, non-routed network (192.168.0.0/24).
>
>
> The machine file generated by the PE contains "ethernet name" of the machines.
> This machine file is use by MPI (IntelMPI) to start the mpi ring.
>
> Shouldn't the hostnames in the machine file be the "infiniband name" (host-ib) ?
>
> I've read the "multiple interface howto" but I not sure it applies to this case.

as SGE is still running across the ethernet, you can just map the hostnames like outlined in the $SGE_ROOT/mpi/startmpi.sh where it's mapped to ATM hostnames. Depending on your applications, it can be necessary to supply -hostname to the startmpi.sh script, so that hostnames are mapped in general (also from the applications point of view).

-- Reuti


> Any advice is welcome,
> Thanks and regards,
> Ionel
>
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257848

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list