[GE users] Open MPI and SGE

Bernard Li bli at bcgsc.ca
Thu Apr 27 17:24:52 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti:
 
So are you saying that I should be using rsh (qrsh) instead of ssh?
 
I am using ssh - it does not work within SGE but does work outside - so I know it works...  but anyways perhaps I'll try it with rsh (qrsh) and see.
 
Do you think you can post your PE template as well as your start/stop scripts?  They are probably the same as mine but I just wanted to double check to be sure.
 
Thanks,
 
Bernard

________________________________

From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Thu 27/04/2006 08:12
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Open MPI and SGE



Hi Bernard:

Am 27.04.2006 um 09:40 schrieb Bernard Li:

> Hi Reuti:
>
> I don't think it's using rsh - it uses ssh by default.
>
> I just rebuilt the SRPM from here:
>
> http://www.open-mpi.org/software/ompi/v1.0/
>
> BTW I also explicitly set OMPI_MCA_pls_rsh_agent="ssh -x" because I 
> used to have issues with xauth with LAM/MPI.  I am fairly certain 
> that it was using ssh and not rsh.  Also it works fine if I run it 
> outside of SGE.  So I know Open MPI is installed properly and works 
> - just not when integrated with SGE.
>
> Any chance you can try the latest release of Open MPI and see if it 
> still works?

well - I just got a minute. For me it's also working fine with 1.0.2. 
But I compiled it a little bit different: at the time I started to 
look into the SGE integration the documentation of OpenMPI was a 
little bit vague. So I investigated on my own the to be changed 
parameter. I changed in the OpenMPI source the file:

~/openmpi-1.0.2/orte/mca/pls/rsh/pls_rsh_component.c

line 161 at that time from:

false, false, "ssh",

to

false, false, "rsh",

Now the original line reads:

false, false, "ssh : rsh",

and this I changed again to include only the rsh to get a Tight 
Integration. As mentioned, this will jump out of the process tree 
anyway, but it's working for me. We have neither rsh nor ssh in the 
cluster, and use only the qrsh command. Output of the mpihello.c:

$ cat test.sh.o17419
/usr/sge/bin/lx24-x86/qrsh -inherit -V node22 orted --bootproxy 1 --
name 0.0.2 --num_procs 3 --vpid_start 0 --nodename node22 --universe 
reuti at node42:default-universe --nsreplica "0.0.0;tcp://
192.168.154.42:46587" --gprreplica "0.0.0;tcp://192.168.154.42:46587" 
--mpi-call-yield 0
Hello World from Node 0.
Hello World from Node 1.

and no errors. The jobscript is.

$ cat test.sh
#!/bin/sh

export PATH=/home/reuti/local/openmpi-1.0.2/bin:$PATH
export LD_LIBRARY_PATH=/home/reuti/local/openmpi-1.0.2/lib:
$LD_LIBRARY_PATH
mpiexec -n $NSLOTS -machinefile $TMPDIR/machines ./mpihello

exit 0


Cheers - Reuti


>
> BTW, I was testing against 6.0u7_1 - if that makes any difference...
>
> Thanks,
>
> Bernard
>
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Wed 26/04/2006 21:57
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Open MPI and SGE
>
> Bernard,
>
> you compiled OpenMPI also to use rsh instead of ssh? This was the
> problem in my first attempt.
>
> -- Reuti
>
>
> Am 27.04.2006 um 06:54 schrieb Bernard Li:
>
> > Hi Reuti:
> >
> > Yeah I followed your instructions in that email you sent out.  So I
> > did
> > modify my "startmpi.sh" script with your modifications.
> >
> > Anyways, if you have any clues regarding how to get it working with
> > 1.0.2, I'd appreciate it, thanks.
> >
> > Cheers,
> >
> > Bernard
> >
> >> -----Original Message-----
> >> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> >> Sent: Wednesday, April 26, 2006 21:52
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] Open MPI and SGE
> >>
> >> Hi there,
> >>
> >> the loose integration was straight forward, but there is no tight
> >> integration for now:
> >>
> >> http://gridengine.sunsource.net/servlets/ReadMsg?listName=dev&
> >> msgNo=2578
> >>
> >> I didn't checked the latest sources again, but I'm not aware of a
> >> change of "orted".
> >>
> >> -- Reuti
> >>
> >>
> >> PS: for the startopenmpi.sh:
> >>
> >> PeHostfile2MachineFile()
> >> {
> >>     cat $1 | while read line; do
> >>        # echo $line
> >>        host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
> >>        nslots=`echo $line|cut -f2 -d" "`
> >>        echo $host slots=$nslots
> >>     done
> >> }
> >>
> >> Am 27.04.2006 um 04:29 schrieb Bernard Li:
> >>
> >>> Has anybody been successful with getting Open MPI integrated with
> >>> SGE (I
> >>> think Reuti has ;-) ).
> >>>
> >>> Anyways, I think I'm pretty close, but I'm stuck with this issue:
> >>>
> >>> [node1:28248] pls:rsh: execv failed with errno=2
> >>>
> >>> Anybody knows what it means?
> >>>
> >>> I basically set it up like Reuti recommended in the following 
> email:
> >>>
> >>> http://gridengine.sunsource.net/servlets/ReadMsg?
> >>> list=users&msgNo=15176
> >>>
> >>> My template looks like this:
> >>>
> >>> pe_name           openmpi
> >>> slots             999
> >>> user_lists        NONE
> >>> xuser_lists       NONE
> >>> start_proc_args   /opt/sge/mpi/startmpi.sh $pe_hostfile
> >>> stop_proc_args    /opt/sge/mpi/stopmpi.sh
> >>> allocation_rule   $round_robin
> >>> control_slaves    FALSE
> >>> job_is_first_task TRUE
> >>> urgency_slots     min
> >>>
> >>> It might be helpful to post a working integration in the
> >> SGE website.
> >>>
> >>> Thanks!
> >>>
> >>> Bernard
> >>>
> >>>
> >> 
> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users-
> help at gridengine.sunsource.net
> >>>
> >>
> >> 
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-
> help at gridengine.sunsource.net
> >>
> >>
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net






More information about the gridengine-users mailing list