[GE users] Open MPI and SGE

Reuti reuti at staff.uni-marburg.de
Thu Apr 27 18:52:47 BST 2006


Hi,

Am 27.04.2006 um 18:24 schrieb Bernard Li:

> Hi Reuti:
>
> So are you saying that I should be using rsh (qrsh) instead of ssh?

at least from OpenMPI point of view, it's just rsh. So you don't need  
to adjust the scripts to create a ssh link and so on...

>
> I am using ssh - it does not work within SGE but does work outside  
> - so I know it works...  but anyways perhaps I'll try it with rsh  
> (qrsh) and see.

This has only to be set up inside SGE. If OpenMPI is thinking of  
using rsh, and the rsh call is converted by SGE to ssh, it should work.

>
> Do you think you can post your PE template as well as your start/ 
> stop scripts?  They are probably the same as mine but I just wanted  
> to double check to be sure.

Besides the posted subroutine I didn't changed anything.

-- Reuti

$ qconf -sp openmpi
pe_name           openmpi
slots             88
user_lists        NONE
xuser_lists       NONE
start_proc_args   /usr/sge/openmpi/startopenmpi.sh -catch_rsh  
$pe_hostfile
stop_proc_args    /usr/sge/openmpi/stopopenmpi.sh
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

> Thanks,
>
> Bernard
>
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Thu 27/04/2006 08:12
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Open MPI and SGE
>
> Hi Bernard:
>
> Am 27.04.2006 um 09:40 schrieb Bernard Li:
>
> > Hi Reuti:
> >
> > I don't think it's using rsh - it uses ssh by default.
> >
> > I just rebuilt the SRPM from here:
> >
> > http://www.open-mpi.org/software/ompi/v1.0/
> >
> > BTW I also explicitly set OMPI_MCA_pls_rsh_agent="ssh -x" because I
> > used to have issues with xauth with LAM/MPI.  I am fairly certain
> > that it was using ssh and not rsh.  Also it works fine if I run it
> > outside of SGE.  So I know Open MPI is installed properly and works
> > - just not when integrated with SGE.
> >
> > Any chance you can try the latest release of Open MPI and see if it
> > still works?
>
> well - I just got a minute. For me it's also working fine with 1.0.2.
> But I compiled it a little bit different: at the time I started to
> look into the SGE integration the documentation of OpenMPI was a
> little bit vague. So I investigated on my own the to be changed
> parameter. I changed in the OpenMPI source the file:
>
> ~/openmpi-1.0.2/orte/mca/pls/rsh/pls_rsh_component.c
>
> line 161 at that time from:
>
> false, false, "ssh",
>
> to
>
> false, false, "rsh",
>
> Now the original line reads:
>
> false, false, "ssh : rsh",
>
> and this I changed again to include only the rsh to get a Tight
> Integration. As mentioned, this will jump out of the process tree
> anyway, but it's working for me. We have neither rsh nor ssh in the
> cluster, and use only the qrsh command. Output of the mpihello.c:
>
> $ cat test.sh.o17419
> /usr/sge/bin/lx24-x86/qrsh -inherit -V node22 orted --bootproxy 1 --
> name 0.0.2 --num_procs 3 --vpid_start 0 --nodename node22 --universe
> reuti at node42:default-universe --nsreplica "0.0.0;tcp://
> 192.168.154.42:46587" --gprreplica "0.0.0;tcp://192.168.154.42:46587"
> --mpi-call-yield 0
> Hello World from Node 0.
> Hello World from Node 1.
>
> and no errors. The jobscript is.
>
> $ cat test.sh
> #!/bin/sh
>
> export PATH=/home/reuti/local/openmpi-1.0.2/bin:$PATH
> export LD_LIBRARY_PATH=/home/reuti/local/openmpi-1.0.2/lib:
> $LD_LIBRARY_PATH
> mpiexec -n $NSLOTS -machinefile $TMPDIR/machines ./mpihello
>
> exit 0
>
>
> Cheers - Reuti
>
>
> >
> > BTW, I was testing against 6.0u7_1 - if that makes any difference...
> >
> > Thanks,
> >
> > Bernard
> >
> > From: Reuti [mailto:reuti at staff.uni-marburg.de]
> > Sent: Wed 26/04/2006 21:57
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] Open MPI and SGE
> >
> > Bernard,
> >
> > you compiled OpenMPI also to use rsh instead of ssh? This was the
> > problem in my first attempt.
> >
> > -- Reuti
> >
> >
> > Am 27.04.2006 um 06:54 schrieb Bernard Li:
> >
> > > Hi Reuti:
> > >
> > > Yeah I followed your instructions in that email you sent out.   
> So I
> > > did
> > > modify my "startmpi.sh" script with your modifications.
> > >
> > > Anyways, if you have any clues regarding how to get it working  
> with
> > > 1.0.2, I'd appreciate it, thanks.
> > >
> > > Cheers,
> > >
> > > Bernard
> > >
> > >> -----Original Message-----
> > >> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> > >> Sent: Wednesday, April 26, 2006 21:52
> > >> To: users at gridengine.sunsource.net
> > >> Subject: Re: [GE users] Open MPI and SGE
> > >>
> > >> Hi there,
> > >>
> > >> the loose integration was straight forward, but there is no tight
> > >> integration for now:
> > >>
> > >> http://gridengine.sunsource.net/servlets/ReadMsg?listName=dev&
> > >> msgNo=2578
> > >>
> > >> I didn't checked the latest sources again, but I'm not aware of a
> > >> change of "orted".
> > >>
> > >> -- Reuti
> > >>
> > >>
> > >> PS: for the startopenmpi.sh:
> > >>
> > >> PeHostfile2MachineFile()
> > >> {
> > >>     cat $1 | while read line; do
> > >>        # echo $line
> > >>        host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
> > >>        nslots=`echo $line|cut -f2 -d" "`
> > >>        echo $host slots=$nslots
> > >>     done
> > >> }
> > >>
> > >> Am 27.04.2006 um 04:29 schrieb Bernard Li:
> > >>
> > >>> Has anybody been successful with getting Open MPI integrated  
> with
> > >>> SGE (I
> > >>> think Reuti has ;-) ).
> > >>>
> > >>> Anyways, I think I'm pretty close, but I'm stuck with this  
> issue:
> > >>>
> > >>> [node1:28248] pls:rsh: execv failed with errno=2
> > >>>
> > >>> Anybody knows what it means?
> > >>>
> > >>> I basically set it up like Reuti recommended in the following
> > email:
> > >>>
> > >>> http://gridengine.sunsource.net/servlets/ReadMsg?
> > >>> list=users&msgNo=15176
> > >>>
> > >>> My template looks like this:
> > >>>
> > >>> pe_name           openmpi
> > >>> slots             999
> > >>> user_lists        NONE
> > >>> xuser_lists       NONE
> > >>> start_proc_args   /opt/sge/mpi/startmpi.sh $pe_hostfile
> > >>> stop_proc_args    /opt/sge/mpi/stopmpi.sh
> > >>> allocation_rule   $round_robin
> > >>> control_slaves    FALSE
> > >>> job_is_first_task TRUE
> > >>> urgency_slots     min
> > >>>
> > >>> It might be helpful to post a working integration in the
> > >> SGE website.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> Bernard
> > >>>
> > >>>
> > >>
> >  
> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: users- 
> unsubscribe at gridengine.sunsource.net
> > >>> For additional commands, e-mail: users-
> > help at gridengine.sunsource.net
> > >>>
> > >>
> > >>
> >  
> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: users- 
> unsubscribe at gridengine.sunsource.net
> > >> For additional commands, e-mail: users-
> > help at gridengine.sunsource.net
> > >>
> > >>
> > >
> > >
> >  
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users- 
> help at gridengine.sunsource.net
> > >
> >
> >  
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list