[GE users] Machine file and dedicated IB network

reuti reuti at staff.uni-marburg.de
Wed May 19 15:08:11 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Am 19.05.2010 um 15:56 schrieb igardais:

> Hi Reuti,
> 
> Thanks for the answer.
> I did that but it not helped.
> 
> I'm still having a (non-SGE-related) issue about DAPL.
> Here are one of the line :
> [1] MPI startup(): DAPL provider <NULL on rank 0:mania-3.beicip.fr differs from <NULL string> on rank 1:mania-3.beicip.fr

is it MPICH1 based? Then the back channel still might be wrong (http://gridengine.sunsource.net/howto/mpich-integration.html) and you need to either:

$ export MPI_HOST=`hostname | sed "s/$[.]beicip[.]fr/-ib.cluster/"`

or adjust the "$SGE_ROOT/mpi/hostname" script to also report some thing like 'mania-X-ib.cluster' for a `hostname` command (the correctly adjusted name for each machine of course) and activate it by the "-hostname" keyword to start_proc_args.

-- Reuti


> It refers to 'mania-X.beicip.fr' which is the hostname of the ethernet interface.
> The IB interfaces are named 'mania-X-ib.cluster'.
> 
> The machine file correctly reports hostname 'mania-X-ib.cluster' and I've forced the I_MPI_DEVICE to "rdma:ib0" to use the ib0 interface.
> 
> I've read that RHEL 5.4 has a buggy DAPL layer. I'll see with the MLNX-OFED-1.5.1 release for RHEL 5.4.
> Hope this will help.
> 
> Ionel
> 
> 
> De : reuti <reuti at staff.uni-marburg.de>
> ? : users at gridengine.sunsource.net
> Envoyé le : Mer 19 mai 2010, 13h 27min 17s
> Objet : Re: [GE users] Machine file and dedicated IB network
> 
> Hi,
> 
> Am 19.05.2010 um 12:51 schrieb igardais:
> 
> > Until now, we ran SGE only over Ethernet.
> > All was fine because the same single network was used for SGE and data.
> > 
> > We received a new cluster with InfiniBand and the setup we made is :
> > - ethernet network (172.31.0.0/16) for datas and SGE
> > - IB network for MPI
> > 
> > IPoIB is setup with a dedicated, non-routed network (192.168.0.0/24).
> > 
> > 
> > The machine file generated by the PE contains "ethernet name" of the machines.
> > This machine file is use by MPI (IntelMPI) to start the mpi ring.
> > 
> > Shouldn't the hostnames in the machine file be the "infiniband name" (host-ib) ?
> > 
> > I've read the "multiple interface howto" but I not sure it applies to this case.
> 
> as SGE is still running across the ethernet, you can just map the hostnames like outlined in the $SGE_ROOT/mpi/startmpi.sh where it's mapped to ATM hostnames. Depending on your applications, it can be necessary to supply -hostname to the startmpi.sh script, so that hostnames are mapped in general (also from the applications point of view).
> 
> -- Reuti
> 
> 
> > Any advice is welcome,
> > Thanks and regards,
> > Ionel
> > 
> > 
> >
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257848
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257863

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list