[GE users] Anybody able to get SGE MPICH2 Tight Integration via SSH working? - SOLVED

Reuti reuti at staff.uni-marburg.de
Thu Jan 5 22:14:08 GMT 2006


Jonathan:

Am 05.01.2006 um 21:01 schrieb Jonathan Schreiter:

> Thanks everyone for your help on resolving this issue.
>  I'll post what I did for anyone else who is trying to
> get this working (and for me when I forget what I
> did).
>
> 1) Follow ssh howto and ssh tight integration howto
> (with minor changes below)
>
> 2) Changes to Reuti's start_mpich2.c
> rsh_argv[0]="rsh"; to rsh_argv[0]="ssh";
> rsh_argv[3]="-port"; to rsh_argv[3]="-p";
>

by using rsh here, you will use the rsh-wrapper. If you name the link  
in the start_proc_args script also to read ssh to access the rsh- 
wrapper it's fine and just cosmetic and shouldn't effect the  
behavior. Whether it's -p or -port should also make no difference,  
but I must admit that the README in the MPICH2 source in the smpd-dir  
says -port, the pdf-documents -p, and the smpd on it's own states "- 
port <port> or -p <port>" (this third argument goes to the started  
smpd).

> then rerun ./aimk and ./install.sh
>
> This changes the way
> $SGE_ROOT/mpich2_smpd/bin/$ARCH/mpich2 calls qrsh to
> start smpd on the exe hosts from rsh to ssh.
>
> 3) Turned off the firewall on master host (not sure if
> this is required - i'll try tweaking this install
> later)
>
> 4) Verified that sgeadmin (my sge install/runas
> account) and myusseraccount  had proper

Although it's okay and fine to create a user e.g. sgeadmin for  
owning /usr/sge (or whereever you installed SGE), the started daemons  
should be started as root. Otherwise the sshd can't start up because  
of permission problems I think.

> authorized_keys (one per line) for each exe host and
> master host.  I used ssh-keygen -d w/o a passpharase.
>
> 5) Had to make sure in the Queue configuration -
> General Configuration - Shell was set to /bin/sh
> (solved a tty error)

More convenient for some (or most?) SGE installations is to set  
shell_start_mode to unix_behavior. This way the first line of your  
scripts will be honored to specify the to be used shell as usual.

>
> 6) Don't forget to make sure the parallel environment
> allocation_rule is set to $round_robin, instead of the
> default.  Reuti's doc shows that, but I missed that.
> This is important so that the correct list is
> generated for the machines file.
>
> 7) I modified the sample script to this and ran it:
> port=$((JOB_ID % 5000 + 20000))
>
> echo "Got $NSLOTS slots."
>
> /usr/local/bin/n1ge6/mpich2_smpd/bin/mpiexec -n
> $NSLOTS -machinefile $MPIR_HOME/machines -p $port
> $MPIR_HOME/examples/cpi
>

Please use the SGE generated machinefile here. This is usually  
created as $TMPDIR/machines and is a unique machinefile with the list  
of the granted nodes for this job. So it will change from job to job.  
If you use a fixed machinefile and any of the mentioned nodes wasn't  
selected by SGE, the qrsh will fail as you may only connect to  
granted machines.

But already before: were the smpd's now started correctly on the nodes?

Cheers - Reuti

> I'm sure I'll have many more questions down the
> line...but for now...
>
> Thanks again everyone!
> Jonathan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list