[GE users] tight intergration problem

Jean-Paul Minet minet at cism.ucl.ac.be
Thu Jan 26 10:35:05 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti,

I have found...

lemaitre /usr/local/mpich-eth-intel/lib # strings libmpich.a |grep rsh
rsh program (Some versions of Kerberos rsh have been observed to have this
/usr/bin/rsh

so the user program (abinip_eth) has got this same /usr/bin/rsh linked in.

Maybe we should re-install mpich with the proper rsh (i.e. without the full 
path)... still this wouldn't explain why the rsh wrapping works under certain 
conditions with the same user binary code.

jp

Reuti wrote:
> Hi,
> 
> Am 26.01.2006 um 09:47 schrieb Jean-Paul Minet:
> 
>> Reuti,
>>
>>> In short: don't use -nolocal! This will exclude the starting node   
>>> from all starts of a MPICH process and will lead to an uneven  
>>> process  distribution. With SGE looking at the number of issued  
>>> qrsh's, this  might break the complete setup. And what you see, is  
>>> that a slave  node is becoming the new master of the MPICH  program. 
>>> Looks okay, but  of course wrong for the Tight Integration.
>>
>>
>> The fact that -nolocal was included in the submission script is  
>> somehow historical.  It's indeed clear to me that the -nolocal  switch 
>> is not desirable in order for the mpich "head-node" to do  indeed some 
>> work.  But, during my "troubleshooting" of the qrsh  wrapping scheme, 
>> it coincidentally happened that, while the - nolocal switch was there, 
>> I could see some qrsh issued (and  therefore could conlcude that the 
>> rsh redirection through the / tmp/... directory and PATH fiddling was 
>> working).  I then wanted to  remove this offending -nolocal switch, 
>> but then any reference to  qrsh disappears... (see below for ps -ef 
>> output)!
>>
>>> What do you mean by master-node? The master-node of the cluster  or  
>>> the "head-node" of the parallel job? With MPICH, also the  
>>> "head-node"  of the parallel job (which is actually a conventional  
>>> exec node in  the cluster) will do some work (therefore don't use - 
>>> nolocal). This  is the one you see in the "qstat" output (or named  
>>> "MASTER" with  "qstat -g t").
>>
>>
>> I indeed meant the SGE MASTER, which is the mpich head node.  No  
>> intention at all to deliberately use -nolocal (just by accident...)
>>
>>> Can you please post the "ps -e f" output when not using -nolocal?  
>>> You  checked also in the default ouput file of the PE (.po), that  
>>> the  hostnames listed there, are the ones which you get by the  
>>> command  "hostname" on the nodes? Otherwise MPICH might fail to  
>>> subtract one  process on the "head-node" of the parallel job.
>>>
>>> Using the Tight Integration, also the normal user can just use  
>>> qrsh.  So, I'd suggest to submit a small script with the proper - pe 
>>> request:
>>> #!/bin/sh
>>> cat $TMPDIR/machines
>>> sleep 120
>>> and check on the head-node of the parallel job, that the link to  
>>> the  rsh wrapper was created in the intended way in $TMPDIR. Is  
>>> $SGE_ROOT  mounted nosuid on the nodes? This might explain that  only 
>>> root can do  it.
>>
>>
>> Before listing the ps -ef output, let me confirm that I checked  that, 
>> on the mpich head-node, the /tmp/job n°/rsh symbolic link is  there 
>> pointing to the /sge/gridware/mpi/bin/rsh, and, in the same  temporary 
>> directory, the machine file seems correct (also confirmed  by the .po 
>> output file).  As far as qrsh is concerned, root can use  it 
>> interactively, but I just realize now that normal users get the  same 
>> error.  SGE was installed (by Sun on delivery of cluster) on  each 
>> node (no remote mount) and this seems not right to me.  I  would 
>> prefer an NFS mount.  You mentioned earlier that no SUID is  required 
>> on specific SGE binaries, but in SGE HowTo's (
> 
> 
> Correct, my remark was a question, whether it was mounted by accident  
> with nosuid, which would prevent the SUID bit to take effect and so  
> won't work. Sorry for the confusion.
> 
>> http://gridengine.sunsource.net/howto/ 
>> commonproblems.html#interactive), there is some mention of SUID for  
>> utilbin rlogin and rsh.  Anyway, I followed that, and I still get  the 
>> error:
>>
>> qrsh -verbose -l mem_free=10M -l num_proc=2 -q all.q at lmexec-100 date
>> your job 2402 ("date") has been submitted
>> waiting for interactive job to be scheduled ...
>> Your interactive job 2402 has been successfully scheduled.
>> Establishing /gridware/sge/utilbin/lx24-amd64/rsh session to host  
>> lmexec-100 ...
>> rcmd: socket: Permission denied
>>
> 
> Mmh, okay let's first look at the wrong $PATH. Can you put a "echo  
> $PATH" in your jobscript? Usually the $TMPDIR is the first one in the  
> created $PATH. If you change the $PATH, the $TMPDIR must again be put  
> in the first place to access the wrapper.
> 
> Another thing: what rsh statement is compiled in?
> 
> strings abinip_eth | egrep "(rsh|ssh)"
> 
> -- Reuti
> 
> 
>> on lmexec-100, I have:
>> -rwxr-xr-x  1 sgeadmin root  194380 Jul 22  2005 qrsh_starter
>> -r-sr-xr-x  1 root     root   32607 Jul 22  2005 rlogin
>> -r-sr-xr-x  1 root     root   22180 Jul 22  2005 rsh
>> -rwxr-xr-x  1 sgeadmin root  218778 Jul 22  2005 rshd
>>
>> Here is the ps -ef output on the head node for a mpich job (np=2)  
>> when -nolocal is not used:
>>
>> sgeadmin  1181  3492  0 08:43 ?        00:00:00 sge_shepherd-2404 -bg
>> minet     1205  1181  0 08:43 ?        00:00:00 bash /var/spool/sge/ 
>> lmexec-94/job_scripts/2404
>> minet     1208  1205  0 08:43 ?        00:00:00 /bin/sh /usr/local/ 
>> mpich-eth-intel/bin/mpirun -np 2 -machinefile /tmp/2404.1.all.q/ 
>> machines abinip_e
>> minet     1292  1208  0 08:43 ?        00:00:12 /home/pan/minet/ 
>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/ 
>> parallel_eth/PI1208 -p4w
>> minet     1293  1292  0 08:43 ?        00:00:00 /home/pan/minet/ 
>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/ 
>> parallel_eth/PI1208 -p4w
>> minet     1294  1292  0 08:43 ?        00:00:00 /usr/bin/rsh  
>> lmexec-72 -l minet -n /home/pan/minet/abinit/parallel_eth/ abinip_eth 
>> lmexec-94 32858 \-
>>
>> We indeed have the node doing some work... but the connexion to the  
>> "slave" is through /usr/bin/rsh instead of qrsh.  On this head  node, 
>> we have also:
>>
>> lmexec-94 /tmp/2404.1.all.q # ls -al
>> total 12
>> drwxr-xr-x  2 minet grppcpm 4096 Jan 26 08:42 .
>> drwxrwxrwt  9 root  root    4096 Jan 26 08:42 ..
>> -rw-r--r--  1 minet grppcpm   20 Jan 26 08:42 machines
>> lrwxrwxrwx  1 minet grppcpm   21 Jan 26 08:42 rsh -> /gridware/sge/ 
>> mpi/rsh
>>
>> and also:
>> lmexec-94 /tmp/2404.1.all.q # cat machines
>> lmexec-94
>> lmexec-72
>>
>> but somehow, the rsh wrapping mechanism get bypassed somewhere...
>>
>> I hope I have been clear in my replies so that it helps you to help  
>> me ;-)
>>
>> Thanks again for your support
>>
>> Jean-Paul
>>
>>> HTH  -Reuti
>>> Am 25.01.2006 um 15:42 schrieb Jean-Paul Minet:
>>>
>>>> Reuti,
>>>>
>>>> I am totally lost with this tight intergration...
>>>>
>>>> 1) as root user, if I use the -nolocal flag as mpirun argument,  I  
>>>> end up with the following process on the "master node":
>>>>
>>>> root      5349  5326  0 14:27 ?        00:00:00 bash /var/spool/ 
>>>> sge/ lmexec-121/job_scripts/2375
>>>> root      5352  5349  0 14:27 ?        00:00:00 /bin/sh /usr/ local/ 
>>>> mpich-eth-intel/bin/mpirun -nolocal -np 2 -machinefile / tmp/ 
>>>> 2375.1.all.q/machines abinip_eth
>>>> root      5446  5352  0 14:27 ?        00:00:00 /gridware/sge/ bin/ 
>>>> lx24-amd64/qrsh -inherit lmexec-62 /home/pan/minet/abinit/  
>>>> parallel_eth/abinip_eth -p4pg /home/
>>>> root      5454  5446  0 14:27 ?        00:00:00 /gridware/sge/  
>>>> utilbin/lx24-amd64/rsh -p 32816 lmexec-62 exec '/gridware/sge/  
>>>> utilbin/lx24-amd64/qrsh_starter' '/v
>>>> root      5455  5454  0 14:27 ?        00:00:00 [rsh] <defunct>
>>>>
>>>> and on the slave node:
>>>>
>>>> sgeadmin 14300  3464  0 14:27 ?        00:00:00 sge_shepherd-2375  -bg
>>>> root     14301 14300  0 14:27 ?        00:00:00 /gridware/sge/  
>>>> utilbin/lx24-amd64/rshd -l
>>>> root     14302 14301  0 14:27 ?        00:00:00 /gridware/sge/  
>>>> utilbin/lx24-amd64/qrsh_starter /var/spool/sge/lmexec-62/  
>>>> active_jobs/2375.1/1.lmexec-62
>>>> root     14303 14302 27 14:27 ?        00:00:31 /home/pan/minet/  
>>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>>> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>> root     14304 14303  0 14:27 ?        00:00:00 /home/pan/minet/  
>>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>>> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>> root     14305 14303  0 14:27 ?        00:00:00 /usr/bin/rsh   
>>>> lmexec-62 -l root -n /home/pan/minet/abinit/parallel_eth/ 
>>>> abinip_eth  lmexec-62 32819 \-p4amslave \-p4yourname lmexec-62
>>>> root     14306  3995  0 14:27 ?        00:00:00 in.rshd -aL
>>>> root     14307 14306 86 14:27 ?        00:01:38 /home/pan/minet/  
>>>> abinit/parallel_eth/abinip_eth lmexec-62 32819 -p4amslave -  
>>>> p4yourname lmexec-62 -p4rmrank 1
>>>> root     14357 14307  0 14:27 ?        00:00:00 /home/pan/minet/  
>>>> abinit/parallel_eth/abinip_eth lmexec-62 32819 -p4amslave -  
>>>> p4yourname lmexec-62 -p4rmrank 1
>>>>
>>>> So, I can see that, in a way, the SGE qrsh/rsh/qrsh_starter are   
>>>> coming into play ans that the sge_shepherd is initiating remote   
>>>> process.  Nevertheless:
>>>>
>>>> - as expected, there is no local instance of the program run on  
>>>> the  master node, which is not what we want.
>>>> - the slave node issues a rsh onto itself, is that expected ?
>>>>
>>>> Under these conditions, qstat -ext reports 0 usage (cpu/mem).
>>>>
>>>> If I don't use this -nolocal flag, then the rsh/qrsh wrapper   
>>>> mechanism doesn't seem to come into play, and the master node  does  
>>>> direct rsh to the slave node. In these conditions, the  qstat -ext  
>>>> reports cpu time (from a single process, which is  also expected  
>>>> since there is no SGE control in this case).
>>>>
>>>> All in all, I don't see how this -nolocal flag can make the rsh   
>>>> wrapper appear to work or fail.
>>>>
>>>> 2) as non root user, the first scenario doesn't work as I get an   
>>>> "error:rcmd: permission denied".  Second scenario work as for  root  
>>>> user.
>>>>
>>>> Quite a bit lost...
>>>>
>>>> Jean-Paul
>>>>
>>>> Reuti wrote:
>>>>
>>>>> Hi Jean-Paul,
>>>>> Am 23.01.2006 um 14:31 schrieb Jean-Paul Minet:
>>>>>
>>>>>> Reuti,
>>>>>>
>>>>>>> for using qrsh the /etc/hosts.equiv isn't necessary. I set  
>>>>>>> this   just  to reflect the login node on all exec nodes to  
>>>>>>> allow   interactive qrsh/ qlogin sessions.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> OK, got this.
>>>>>>
>>>>>>> As qrsh will use a chosen port: any firewall and/or etc/ hosts.  
>>>>>>> (allow| deny) configured? - Reuti
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> No firewall nor hosts.xxx.  The problem was from wrong mode  set  
>>>>>> on  rsh/rlogin on exec nodes (I had played with those  following  
>>>>>> some  hints for qrsh problem solving on the SGE FAQ,  which  
>>>>>> probably  messed up everything).
>>>>>>
>>>>>> MPI jobs can now run with qrsh... CPU time displayed by "qstat  -  
>>>>>> ext" is no longer 0... but it corresponds to a single cpu!
>>>>>>
>>>>>> 2218 0.24170 0.24169 Test_abini minet        NA                  
>>>>>> grppcpm    r 0:00:12:09 205.41073 0.00000 74727     0     0   
>>>>>> 71428   3298 0.04  all.q at lmexec-88                    2
>>>>>>
>>>>>> This job started about 12 minutes earlier, and runs on 2  cpus.    
>>>>>> Shouldn't the displayed "cpu" be the sum of all cpu  times or is   
>>>>>> this the correct behavior?
>>>>>>
>>>>>> Thks for your input
>>>>>>
>>>>> is "qstat -j 2218" giving you more reasonable results in the   
>>>>> "usage  1:" line? As "qstat -g t -ext" will also display the  CPU  
>>>>> time for  slave processes, these should be per process. -  Reuti
>>>>>
>>>>>> jp
>>>>>>
>>>>>> ------------------------------------------------------------------ 
>>>>>> -- -
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users- help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- --
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>> -- 
>>>> Jean-Paul Minet
>>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage  de  
>>>> Masse
>>>> Université Catholique de Louvain
>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>
>>>> -------------------------------------------------------------------- -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> -- 
>> Jean-Paul Minet
>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de  Masse
>> Université Catholique de Louvain
>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 
> 

-- 
Jean-Paul Minet
Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
Université Catholique de Louvain
Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list