[GE users] RE : [GE users] SGE 6.2 : use of ssh for qlogin/qrsh

reuti reuti at staff.uni-marburg.de
Sun Nov 23 17:49:53 GMT 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 23.11.2008 um 18:34 schrieb igardais:

> I was not able to make startmpi's '-catch_rsh' option to work with  
> PEs.

By default "ssh" is compiled into MPICH(2) and so also in Intel MPI I  
think. So you would have needed an ssh-wrapper to catch the call, or  
instruct MPICH(2) to use rsh instead of the compiled in ssh. But even  
then, you wouldn't have a Tight Integration.

> So I used the startmpi script only to generate a machine file and  
> forced the mpdboot/mpirun command inside the submitted script to  
> use ssh (we use IntelMPI implementation).

I never tried Intel MPI, as we don't have it. But when it's based on  
MPICH2, it might work. You can already download the http:// 
gridengine.sunsource.net/howto/mpich2-integration/mpich2-60.tgz which  
includes also the mpd startup method for MPICH2, which might also  
apply to Intel MPI. At least the necessary scripts seem to be there,  
when I peek into their runtime package.

Be ware: you will have to ENABLE_ADDGRP_KILL=TRUE in the execd_params  
section in SGE's configuration.

mpdboot can't be used, as it would fork-off the remote processes  
unconditionally. I start all by hand - even on the master node with a  
qrsh to bind it to the shepherd; so you will have to compile and  
install a helper program like for the smpd startup method. A 4 CPU  
job on two nodes should give on the master of the two:


  3385 ?        Sl     1:16 /usr/sge/bin/lx24-x86/sge_execd
  9976 ?        S      0:00  \_ sge_shepherd-515 -bg
10047 ?        Ss     0:00  |   \_ /bin/sh /var/spool/sge/pc15370/ 
job_scripts/515
10048 ?        S      0:00  |       \_ python2.5 /home/reuti/local/ 
mpich2-1.0.8/bin/mpiexec -machinefile /tmp/515.1.all.q/machines -n 4 / 
home/reuti/mpihello
10009 ?        Sl     0:00  \_ sge_shepherd-515 -bg
10010 ?        Ss     0:00      \_ /usr/sge/utilbin/lx24-x86/ 
qrsh_starter /var/spool/sge/pc15370/active_jobs/515.1/1.pc15370
10019 ?        S      0:00          \_ python2.5 /home/reuti/local/ 
mpich2-1.0.8/bin/mpd
10049 ?        S      0:00              \_ python2.5 /home/reuti/ 
local/mpich2-1.0.8/bin/mpd
10052 ?        R      0:02              |   \_ /home/reuti/mpihello
10050 ?        S      0:00              \_ python2.5 /home/reuti/ 
local/mpich2-1.0.8/bin/mpd
10051 ?        R      0:04                  \_ /home/reuti/mpihello
10000 ?        Sl     0:00 /usr/sge/bin/lx24-x86/qrsh -inherit -V  
pc15370 /home/reuti/local/mpich2-1.0.8/bin/mpd
10028 ?        Sl     0:00 /usr/sge/bin/lx24-x86/qrsh -inherit -V  
pc15381 /home/reuti/local/mpich2-1.0.8/bin/mpd -h pc15370 -p 6860 -n


An in the job.sh.pe<job_id>:

-catch_rsh /var/spool/sge/pc15370/active_jobs/515.1/pe_hostfile /home/ 
reuti/local/mpich2-1.0.8
pc15370:2
pc15381:2
startmpich2.sh: check for local mpd daemon (1 of 10)
/usr/sge/bin/lx24-x86/qrsh -inherit -V pc15370 /home/reuti/local/ 
mpich2-1.0.8/bin/mpd
startmpich2.sh: check for local mpd daemon (2 of 10)
startmpich2.sh: check for mpd daemons (1 of 10)
/usr/sge/bin/lx24-x86/qrsh -inherit -V pc15381 /home/reuti/local/ 
mpich2-1.0.8/bin/mpd -h pc15370 -p 6860 -n
startmpich2.sh: check for mpd daemons (2 of 10)
startmpich2.sh: got all 2 of 2 nodes

Hopefully I manage to update also the Howto in the coming week. For  
now: it's similar to the smpd startup in the Howto.

Cheers - Reuti

> Also, at this time, I was not aware that sge's rsh does not use  
> xinetd rsh (and I want to avoid to open the r* commands on each  
> nodes) so the use of ssh comforted me in this choice.
>
> Ionel
>
>
>
> -------- Message d'origine--------
> De: reuti [mailto:reuti at staff.uni-marburg.de]
> Date: dim. 23/11/2008 18:14
> ?: users at gridengine.sunsource.net
> Objet : Re: [GE users] SGE 6.2 : use of ssh for qlogin/qrsh
>
> Hi,
>
> Am 23.11.2008 um 16:48 schrieb igardais:
>
>> I'm glad to announce that I successfully upgrade our cluster to SGE
>> 6.2.
>> However, when we were running 6.1 and earlier, I configured cluster-
>> wide rsh_ and rlogin_daemon to use 'sshd -i'.
>> During 6.2 upgrade, I've choose to use the new IJS mode :
>> parameters now read "builtin".
>>
>> Is the reporting and lost-of-control issues are closed with 6.2 so
>> I can keep going to use public-key ssh ?
>>
> you will have to compile SGE on your own with the switch "-tight-ssh".
>
>> or should I try to make our MPI programs to use that 'builtin'
>> method so we will be 'compatible' with future release of SGE ?
>>
>
> What was the reason for you in the past to use ssh?
>
> -- Reuti
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=89596
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=89601
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].<winmail.dat>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89603

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list