[GE users] Anybody able to get SGE MPICH2 Tight Integration via SSH working? - SOLVED
jonathanschreiter at yahoo.com
Sat Jan 7 15:48:42 GMT 2006
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
> > 1) Follow ssh howto and ssh tight integration
> > (with minor changes below)
> > 2) Changes to Reuti's start_mpich2.c
> > rsh_argv="rsh"; to rsh_argv="ssh";
> > rsh_argv="-port"; to rsh_argv="-p";
> by using rsh here, you will use the rsh-wrapper. If
> you name the link
> in the start_proc_args script also to read ssh to
> access the rsh-
> wrapper it's fine and just cosmetic and shouldn't
> effect the
> behavior. Whether it's -p or -port should also make
> no difference,
> but I must admit that the README in the MPICH2
> source in the smpd-dir
> says -port, the pdf-documents -p, and the smpd on
> it's own states "-
> port <port> or -p <port>" (this third argument goes
> to the started
> > then rerun ./aimk and ./install.sh
> > This changes the way
> > $SGE_ROOT/mpich2_smpd/bin/$ARCH/mpich2 calls qrsh
> > start smpd on the exe hosts from rsh to ssh.
Yes, I just changed to -p to match the manual (don't
think it really matters). I changed the .c file back
to rsh to see if it would make a difference again, and
everything is still working. Guess that fix isn't
> > 4) Verified that sgeadmin (my sge install/runas
> > account) and myusseraccount had proper
> Although it's okay and fine to create a user e.g.
> sgeadmin for
> owning /usr/sge (or whereever you installed SGE),
> the started daemons
> should be started as root. Otherwise the sshd can't
> start up because
> of permission problems I think.
Yes, you are correct. This step is not required.
> > 5) Had to make sure in the Queue configuration -
> > General Configuration - Shell was set to /bin/sh
> > (solved a tty error)
> More convenient for some (or most?) SGE
> installations is to set
> shell_start_mode to unix_behavior. This way the
> first line of your
> scripts will be honored to specify the to be used
> shell as usual.
Well the sample script I was using had the first line
and the SGE configuration had /bin/csh. I searched
the mailing list and found someone else had a similar
tty error and this was the fix. I applied this change
and the tty error went away. Not sure if this is a
linux / solaris issue or not.
> > /usr/local/bin/n1ge6/mpich2_smpd/bin/mpiexec -n
> > $NSLOTS -machinefile $MPIR_HOME/machines -p $port
> > $MPIR_HOME/examples/cpi
> Please use the SGE generated machinefile here. This
> is usually
> created as $TMPDIR/machines and is a unique
> machinefile with the list
> of the granted nodes for this job. So it will change
> from job to job.
> If you use a fixed machinefile and any of the
> mentioned nodes wasn't
> selected by SGE, the qrsh will fail as you may only
> connect to
> granted machines.
Yes, this was my mistake - I forgot to change that. I
only have 2 exe hosts right now for testing purposes -
so it didn't make a difference just yet!
> But already before: were the smpd's now started
> correctly on the nodes?
Yes it's working perfectly!
> Cheers - Reuti
Thanks again - Jonathan
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users