[GE users] Anybody able to get SGE MPICH2 Tight Integration via SSH working? - SOLVED

Jonathan Schreiter jonathanschreiter at yahoo.com
Sat Jan 7 15:48:42 GMT 2006

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


> >
> > 1) Follow ssh howto and ssh tight integration
> howto
> > (with minor changes below)
> >
> > 2) Changes to Reuti's start_mpich2.c
> > rsh_argv[0]="rsh"; to rsh_argv[0]="ssh";
> > rsh_argv[3]="-port"; to rsh_argv[3]="-p";
> >
> by using rsh here, you will use the rsh-wrapper. If
> you name the link  
> in the start_proc_args script also to read ssh to
> access the rsh- 
> wrapper it's fine and just cosmetic and shouldn't
> effect the  
> behavior. Whether it's -p or -port should also make
> no difference,  
> but I must admit that the README in the MPICH2
> source in the smpd-dir  
> says -port, the pdf-documents -p, and the smpd on
> it's own states "- 
> port <port> or -p <port>" (this third argument goes
> to the started  
> smpd).
> > then rerun ./aimk and ./install.sh
> >
> > This changes the way
> > $SGE_ROOT/mpich2_smpd/bin/$ARCH/mpich2 calls qrsh
> to
> > start smpd on the exe hosts from rsh to ssh.

Yes, I just changed to -p to match the manual (don't
think it really matters).  I changed the .c file back
to rsh to see if it would make a difference again, and
everything is still working.  Guess that fix isn't
required afterall.

> >
> > 4) Verified that sgeadmin (my sge install/runas
> > account) and myusseraccount  had proper
> Although it's okay and fine to create a user e.g.
> sgeadmin for  
> owning /usr/sge (or whereever you installed SGE),
> the started daemons  
> should be started as root. Otherwise the sshd can't
> start up because  
> of permission problems I think.

Yes, you are correct.  This step is not required.

> >
> > 5) Had to make sure in the Queue configuration -
> > General Configuration - Shell was set to /bin/sh
> > (solved a tty error)
> More convenient for some (or most?) SGE
> installations is to set  
> shell_start_mode to unix_behavior. This way the
> first line of your  
> scripts will be honored to specify the to be used
> shell as usual.

Well the sample script I was using had the first line

and the SGE configuration had /bin/csh.  I searched
the mailing list and found someone else had a similar
tty error and this was the fix.  I applied this change
and the tty error went away.  Not sure if this is a
linux / solaris issue or not.
> >
> > /usr/local/bin/n1ge6/mpich2_smpd/bin/mpiexec -n
> > $NSLOTS -machinefile $MPIR_HOME/machines -p $port
> > $MPIR_HOME/examples/cpi
> >
> Please use the SGE generated machinefile here. This
> is usually  
> created as $TMPDIR/machines and is a unique
> machinefile with the list  
> of the granted nodes for this job. So it will change
> from job to job.  
> If you use a fixed machinefile and any of the
> mentioned nodes wasn't  
> selected by SGE, the qrsh will fail as you may only
> connect to  
> granted machines.

Yes, this was my mistake - I forgot to change that.  I
only have 2 exe hosts right now for testing purposes -
so it didn't make a difference just yet!

> But already before: were the smpd's now started
> correctly on the nodes?

Yes it's working perfectly!

> Cheers - Reuti

Thanks again - Jonathan

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list