[GE users] Machine file and dedicated IB network

beatrubi beat at 0x1b.ch
Thu May 20 06:01:31 BST 2010


Hi Ionel!

Quoting <igardais at yahoo.fr> (19.05.10 15:56):

> The machine file correctly reports hostname 'mania-X-ib.cluster' and I've
> forced the I_MPI_DEVICE to "rdma:ib0" to use the ib0 interface.

Dont' worry about the names in the machinefile. Use the "Ethernet Names" -
all Infiniband aware MPIs like MVAPICH/MVAPICH2, OpenMPI, Intel MPI or
Platform/HP MPI are clever enough to use automatically the fast interconnect
when available.

Of course you have to configure IPoIB to be able to use Intel MPI. But even
then, you won't have to use the IPoIB names in the machinefile.

> I'm still having a (non-SGE-related) issue about DAPL.
> I've read that RHEL 5.4 has a buggy DAPL layer. I'll see with the
> MLNX-OFED-1.5.1 release for RHEL 5.4.

DAPL is pain in the ass. Try the following lines in your .bashrc:

export I_MPI_DAT_LIBRARY=/usr/lib64/libdat.so.1
export I_MPI_DEVICE=rdssm

Use also debugging to see why it fails:

mpiexec -machinefile machinefile -n 2 -env I_MPI_DEBUG 3 <mpi app>

Use a benchmark like the Intel MPI Benchmark (formerly Pallas) to check if
you are really using Infiniband. Intel MPI has a fallback. Your application
still works when Infiniband couldn't be initialized, but on a very slow
level.

HTH
Beat

-- 
     \|/                           Beat Rubischon <beat at 0x1b.ch>
   ( 0^0 )                             http://www.0x1b.ch/~beat/
oOO--(_)--OOo---------------------------------------------------
Meine Erlebnisse, Gedanken und Traeume: http://www.0x1b.ch/blog/

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257939

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list