[GE users] Open MPI and SGE

Reuti reuti at staff.uni-marburg.de
Thu Apr 27 16:12:43 BST 2006


Hi Bernard:

Am 27.04.2006 um 09:40 schrieb Bernard Li:

> Hi Reuti:
>
> I don't think it's using rsh - it uses ssh by default.
>
> I just rebuilt the SRPM from here:
>
> http://www.open-mpi.org/software/ompi/v1.0/
>
> BTW I also explicitly set OMPI_MCA_pls_rsh_agent="ssh -x" because I  
> used to have issues with xauth with LAM/MPI.  I am fairly certain  
> that it was using ssh and not rsh.  Also it works fine if I run it  
> outside of SGE.  So I know Open MPI is installed properly and works  
> - just not when integrated with SGE.
>
> Any chance you can try the latest release of Open MPI and see if it  
> still works?

well - I just got a minute. For me it's also working fine with 1.0.2.  
But I compiled it a little bit different: at the time I started to  
look into the SGE integration the documentation of OpenMPI was a  
little bit vague. So I investigated on my own the to be changed  
parameter. I changed in the OpenMPI source the file:

~/openmpi-1.0.2/orte/mca/pls/rsh/pls_rsh_component.c

line 161 at that time from:

false, false, "ssh",

to

false, false, "rsh",

Now the original line reads:

false, false, "ssh : rsh",

and this I changed again to include only the rsh to get a Tight  
Integration. As mentioned, this will jump out of the process tree  
anyway, but it's working for me. We have neither rsh nor ssh in the  
cluster, and use only the qrsh command. Output of the mpihello.c:

$ cat test.sh.o17419
/usr/sge/bin/lx24-x86/qrsh -inherit -V node22 orted --bootproxy 1 -- 
name 0.0.2 --num_procs 3 --vpid_start 0 --nodename node22 --universe  
reuti at node42:default-universe --nsreplica "0.0.0;tcp:// 
192.168.154.42:46587" --gprreplica "0.0.0;tcp://192.168.154.42:46587"  
--mpi-call-yield 0
Hello World from Node 0.
Hello World from Node 1.

and no errors. The jobscript is.

$ cat test.sh
#!/bin/sh

export PATH=/home/reuti/local/openmpi-1.0.2/bin:$PATH
export LD_LIBRARY_PATH=/home/reuti/local/openmpi-1.0.2/lib: 
$LD_LIBRARY_PATH
mpiexec -n $NSLOTS -machinefile $TMPDIR/machines ./mpihello

exit 0


Cheers - Reuti


>
> BTW, I was testing against 6.0u7_1 - if that makes any difference...
>
> Thanks,
>
> Bernard
>
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Wed 26/04/2006 21:57
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Open MPI and SGE
>
> Bernard,
>
> you compiled OpenMPI also to use rsh instead of ssh? This was the
> problem in my first attempt.
>
> -- Reuti
>
>
> Am 27.04.2006 um 06:54 schrieb Bernard Li:
>
> > Hi Reuti:
> >
> > Yeah I followed your instructions in that email you sent out.  So I
> > did
> > modify my "startmpi.sh" script with your modifications.
> >
> > Anyways, if you have any clues regarding how to get it working with
> > 1.0.2, I'd appreciate it, thanks.
> >
> > Cheers,
> >
> > Bernard
> >
> >> -----Original Message-----
> >> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> >> Sent: Wednesday, April 26, 2006 21:52
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] Open MPI and SGE
> >>
> >> Hi there,
> >>
> >> the loose integration was straight forward, but there is no tight
> >> integration for now:
> >>
> >> http://gridengine.sunsource.net/servlets/ReadMsg?listName=dev&
> >> msgNo=2578
> >>
> >> I didn't checked the latest sources again, but I'm not aware of a
> >> change of "orted".
> >>
> >> -- Reuti
> >>
> >>
> >> PS: for the startopenmpi.sh:
> >>
> >> PeHostfile2MachineFile()
> >> {
> >>     cat $1 | while read line; do
> >>        # echo $line
> >>        host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
> >>        nslots=`echo $line|cut -f2 -d" "`
> >>        echo $host slots=$nslots
> >>     done
> >> }
> >>
> >> Am 27.04.2006 um 04:29 schrieb Bernard Li:
> >>
> >>> Has anybody been successful with getting Open MPI integrated with
> >>> SGE (I
> >>> think Reuti has ;-) ).
> >>>
> >>> Anyways, I think I'm pretty close, but I'm stuck with this issue:
> >>>
> >>> [node1:28248] pls:rsh: execv failed with errno=2
> >>>
> >>> Anybody knows what it means?
> >>>
> >>> I basically set it up like Reuti recommended in the following  
> email:
> >>>
> >>> http://gridengine.sunsource.net/servlets/ReadMsg?
> >>> list=users&msgNo=15176
> >>>
> >>> My template looks like this:
> >>>
> >>> pe_name           openmpi
> >>> slots             999
> >>> user_lists        NONE
> >>> xuser_lists       NONE
> >>> start_proc_args   /opt/sge/mpi/startmpi.sh $pe_hostfile
> >>> stop_proc_args    /opt/sge/mpi/stopmpi.sh
> >>> allocation_rule   $round_robin
> >>> control_slaves    FALSE
> >>> job_is_first_task TRUE
> >>> urgency_slots     min
> >>>
> >>> It might be helpful to post a working integration in the
> >> SGE website.
> >>>
> >>> Thanks!
> >>>
> >>> Bernard
> >>>
> >>>
> >>  
> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users- 
> help at gridengine.sunsource.net
> >>>
> >>
> >>  
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users- 
> help at gridengine.sunsource.net
> >>
> >>
> >
> >  
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list