[GE users] tight intergration problem

Reuti reuti at staff.uni-marburg.de
Thu Jan 26 10:43:30 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Am 26.01.2006 um 11:35 schrieb Jean-Paul Minet:

> Reuti,
>
> I have found...
>
> lemaitre /usr/local/mpich-eth-intel/lib # strings libmpich.a |grep rsh
> rsh program (Some versions of Kerberos rsh have been observed to  
> have this
> /usr/bin/rsh
>
> so the user program (abinip_eth) has got this same /usr/bin/rsh  
> linked in.
>
> Maybe we should re-install mpich with the proper rsh (i.e. without  
> the full path)... still this wouldn't explain why the rsh wrapping  
> works under certain conditions with the same user binary code.
>

Well, this would mean also to recompile - or at least relink - your  
application, as the .a libs are already in the executable. - Reuti

> jp
>
> Reuti wrote:
>> Hi,
>> Am 26.01.2006 um 09:47 schrieb Jean-Paul Minet:
>>> Reuti,
>>>
>>>> In short: don't use -nolocal! This will exclude the starting  
>>>> node   from all starts of a MPICH process and will lead to an  
>>>> uneven  process  distribution. With SGE looking at the number of  
>>>> issued  qrsh's, this  might break the complete setup. And what  
>>>> you see, is  that a slave  node is becoming the new master of  
>>>> the MPICH  program. Looks okay, but  of course wrong for the  
>>>> Tight Integration.
>>>
>>>
>>> The fact that -nolocal was included in the submission script is   
>>> somehow historical.  It's indeed clear to me that the -nolocal   
>>> switch is not desirable in order for the mpich "head-node" to do   
>>> indeed some work.  But, during my "troubleshooting" of the qrsh   
>>> wrapping scheme, it coincidentally happened that, while the -  
>>> nolocal switch was there, I could see some qrsh issued (and   
>>> therefore could conlcude that the rsh redirection through the /  
>>> tmp/... directory and PATH fiddling was working).  I then wanted  
>>> to  remove this offending -nolocal switch, but then any reference  
>>> to  qrsh disappears... (see below for ps -ef output)!
>>>
>>>> What do you mean by master-node? The master-node of the cluster   
>>>> or  the "head-node" of the parallel job? With MPICH, also the   
>>>> "head-node"  of the parallel job (which is actually a  
>>>> conventional  exec node in  the cluster) will do some work  
>>>> (therefore don't use - nolocal). This  is the one you see in the  
>>>> "qstat" output (or named  "MASTER" with  "qstat -g t").
>>>
>>>
>>> I indeed meant the SGE MASTER, which is the mpich head node.  No   
>>> intention at all to deliberately use -nolocal (just by accident...)
>>>
>>>> Can you please post the "ps -e f" output when not using - 
>>>> nolocal?  You  checked also in the default ouput file of the PE  
>>>> (.po), that  the  hostnames listed there, are the ones which you  
>>>> get by the  command  "hostname" on the nodes? Otherwise MPICH  
>>>> might fail to  subtract one  process on the "head-node" of the  
>>>> parallel job.
>>>>
>>>> Using the Tight Integration, also the normal user can just use   
>>>> qrsh.  So, I'd suggest to submit a small script with the proper  
>>>> - pe request:
>>>> #!/bin/sh
>>>> cat $TMPDIR/machines
>>>> sleep 120
>>>> and check on the head-node of the parallel job, that the link  
>>>> to  the  rsh wrapper was created in the intended way in $TMPDIR.  
>>>> Is  $SGE_ROOT  mounted nosuid on the nodes? This might explain  
>>>> that  only root can do  it.
>>>
>>>
>>> Before listing the ps -ef output, let me confirm that I checked   
>>> that, on the mpich head-node, the /tmp/job n°/rsh symbolic link  
>>> is  there pointing to the /sge/gridware/mpi/bin/rsh, and, in the  
>>> same  temporary directory, the machine file seems correct (also  
>>> confirmed  by the .po output file).  As far as qrsh is concerned,  
>>> root can use  it interactively, but I just realize now that  
>>> normal users get the  same error.  SGE was installed (by Sun on  
>>> delivery of cluster) on  each node (no remote mount) and this  
>>> seems not right to me.  I  would prefer an NFS mount.  You  
>>> mentioned earlier that no SUID is  required on specific SGE  
>>> binaries, but in SGE HowTo's (
>> Correct, my remark was a question, whether it was mounted by  
>> accident  with nosuid, which would prevent the SUID bit to take  
>> effect and so  won't work. Sorry for the confusion.
>>> http://gridengine.sunsource.net/howto/  
>>> commonproblems.html#interactive), there is some mention of SUID  
>>> for  utilbin rlogin and rsh.  Anyway, I followed that, and I  
>>> still get  the error:
>>>
>>> qrsh -verbose -l mem_free=10M -l num_proc=2 -q all.q at lmexec-100 date
>>> your job 2402 ("date") has been submitted
>>> waiting for interactive job to be scheduled ...
>>> Your interactive job 2402 has been successfully scheduled.
>>> Establishing /gridware/sge/utilbin/lx24-amd64/rsh session to  
>>> host  lmexec-100 ...
>>> rcmd: socket: Permission denied
>>>
>> Mmh, okay let's first look at the wrong $PATH. Can you put a  
>> "echo  $PATH" in your jobscript? Usually the $TMPDIR is the first  
>> one in the  created $PATH. If you change the $PATH, the $TMPDIR  
>> must again be put  in the first place to access the wrapper.
>> Another thing: what rsh statement is compiled in?
>> strings abinip_eth | egrep "(rsh|ssh)"
>> -- Reuti
>>> on lmexec-100, I have:
>>> -rwxr-xr-x  1 sgeadmin root  194380 Jul 22  2005 qrsh_starter
>>> -r-sr-xr-x  1 root     root   32607 Jul 22  2005 rlogin
>>> -r-sr-xr-x  1 root     root   22180 Jul 22  2005 rsh
>>> -rwxr-xr-x  1 sgeadmin root  218778 Jul 22  2005 rshd
>>>
>>> Here is the ps -ef output on the head node for a mpich job  
>>> (np=2)  when -nolocal is not used:
>>>
>>> sgeadmin  1181  3492  0 08:43 ?        00:00:00 sge_shepherd-2404  
>>> -bg
>>> minet     1205  1181  0 08:43 ?        00:00:00 bash /var/spool/ 
>>> sge/ lmexec-94/job_scripts/2404
>>> minet     1208  1205  0 08:43 ?        00:00:00 /bin/sh /usr/ 
>>> local/ mpich-eth-intel/bin/mpirun -np 2 -machinefile /tmp/ 
>>> 2404.1.all.q/ machines abinip_e
>>> minet     1292  1208  0 08:43 ?        00:00:12 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>> parallel_eth/PI1208 -p4w
>>> minet     1293  1292  0 08:43 ?        00:00:00 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>> parallel_eth/PI1208 -p4w
>>> minet     1294  1292  0 08:43 ?        00:00:00 /usr/bin/rsh   
>>> lmexec-72 -l minet -n /home/pan/minet/abinit/parallel_eth/  
>>> abinip_eth lmexec-94 32858 \-
>>>
>>> We indeed have the node doing some work... but the connexion to  
>>> the  "slave" is through /usr/bin/rsh instead of qrsh.  On this  
>>> head  node, we have also:
>>>
>>> lmexec-94 /tmp/2404.1.all.q # ls -al
>>> total 12
>>> drwxr-xr-x  2 minet grppcpm 4096 Jan 26 08:42 .
>>> drwxrwxrwt  9 root  root    4096 Jan 26 08:42 ..
>>> -rw-r--r--  1 minet grppcpm   20 Jan 26 08:42 machines
>>> lrwxrwxrwx  1 minet grppcpm   21 Jan 26 08:42 rsh -> /gridware/ 
>>> sge/ mpi/rsh
>>>
>>> and also:
>>> lmexec-94 /tmp/2404.1.all.q # cat machines
>>> lmexec-94
>>> lmexec-72
>>>
>>> but somehow, the rsh wrapping mechanism get bypassed somewhere...
>>>
>>> I hope I have been clear in my replies so that it helps you to  
>>> help  me ;-)
>>>
>>> Thanks again for your support
>>>
>>> Jean-Paul
>>>
>>>> HTH  -Reuti
>>>> Am 25.01.2006 um 15:42 schrieb Jean-Paul Minet:
>>>>
>>>>> Reuti,
>>>>>
>>>>> I am totally lost with this tight intergration...
>>>>>
>>>>> 1) as root user, if I use the -nolocal flag as mpirun  
>>>>> argument,  I  end up with the following process on the "master  
>>>>> node":
>>>>>
>>>>> root      5349  5326  0 14:27 ?        00:00:00 bash /var/ 
>>>>> spool/ sge/ lmexec-121/job_scripts/2375
>>>>> root      5352  5349  0 14:27 ?        00:00:00 /bin/sh /usr/  
>>>>> local/ mpich-eth-intel/bin/mpirun -nolocal -np 2 -machinefile /  
>>>>> tmp/ 2375.1.all.q/machines abinip_eth
>>>>> root      5446  5352  0 14:27 ?        00:00:00 /gridware/sge/  
>>>>> bin/ lx24-amd64/qrsh -inherit lmexec-62 /home/pan/minet/ 
>>>>> abinit/  parallel_eth/abinip_eth -p4pg /home/
>>>>> root      5454  5446  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>> utilbin/lx24-amd64/rsh -p 32816 lmexec-62 exec '/gridware/sge/   
>>>>> utilbin/lx24-amd64/qrsh_starter' '/v
>>>>> root      5455  5454  0 14:27 ?        00:00:00 [rsh] <defunct>
>>>>>
>>>>> and on the slave node:
>>>>>
>>>>> sgeadmin 14300  3464  0 14:27 ?        00:00:00  
>>>>> sge_shepherd-2375  -bg
>>>>> root     14301 14300  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>> utilbin/lx24-amd64/rshd -l
>>>>> root     14302 14301  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>> utilbin/lx24-amd64/qrsh_starter /var/spool/sge/lmexec-62/   
>>>>> active_jobs/2375.1/1.lmexec-62
>>>>> root     14303 14302 27 14:27 ?        00:00:31 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/ 
>>>>> abinit/  parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>>> root     14304 14303  0 14:27 ?        00:00:00 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/ 
>>>>> abinit/  parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>>> root     14305 14303  0 14:27 ?        00:00:00 /usr/bin/rsh    
>>>>> lmexec-62 -l root -n /home/pan/minet/abinit/parallel_eth/  
>>>>> abinip_eth  lmexec-62 32819 \-p4amslave \-p4yourname lmexec-62
>>>>> root     14306  3995  0 14:27 ?        00:00:00 in.rshd -aL
>>>>> root     14307 14306 86 14:27 ?        00:01:38 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth lmexec-62 32819 - 
>>>>> p4amslave -  p4yourname lmexec-62 -p4rmrank 1
>>>>> root     14357 14307  0 14:27 ?        00:00:00 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth lmexec-62 32819 - 
>>>>> p4amslave -  p4yourname lmexec-62 -p4rmrank 1
>>>>>
>>>>> So, I can see that, in a way, the SGE qrsh/rsh/qrsh_starter  
>>>>> are   coming into play ans that the sge_shepherd is initiating  
>>>>> remote   process.  Nevertheless:
>>>>>
>>>>> - as expected, there is no local instance of the program run  
>>>>> on  the  master node, which is not what we want.
>>>>> - the slave node issues a rsh onto itself, is that expected ?
>>>>>
>>>>> Under these conditions, qstat -ext reports 0 usage (cpu/mem).
>>>>>
>>>>> If I don't use this -nolocal flag, then the rsh/qrsh wrapper    
>>>>> mechanism doesn't seem to come into play, and the master node   
>>>>> does  direct rsh to the slave node. In these conditions, the   
>>>>> qstat -ext  reports cpu time (from a single process, which is   
>>>>> also expected  since there is no SGE control in this case).
>>>>>
>>>>> All in all, I don't see how this -nolocal flag can make the  
>>>>> rsh   wrapper appear to work or fail.
>>>>>
>>>>> 2) as non root user, the first scenario doesn't work as I get  
>>>>> an   "error:rcmd: permission denied".  Second scenario work as  
>>>>> for  root  user.
>>>>>
>>>>> Quite a bit lost...
>>>>>
>>>>> Jean-Paul
>>>>>
>>>>> Reuti wrote:
>>>>>
>>>>>> Hi Jean-Paul,
>>>>>> Am 23.01.2006 um 14:31 schrieb Jean-Paul Minet:
>>>>>>
>>>>>>> Reuti,
>>>>>>>
>>>>>>>> for using qrsh the /etc/hosts.equiv isn't necessary. I set   
>>>>>>>> this   just  to reflect the login node on all exec nodes to   
>>>>>>>> allow   interactive qrsh/ qlogin sessions.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> OK, got this.
>>>>>>>
>>>>>>>> As qrsh will use a chosen port: any firewall and/or etc/  
>>>>>>>> hosts.  (allow| deny) configured? - Reuti
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> No firewall nor hosts.xxx.  The problem was from wrong mode   
>>>>>>> set  on  rsh/rlogin on exec nodes (I had played with those   
>>>>>>> following  some  hints for qrsh problem solving on the SGE  
>>>>>>> FAQ,  which  probably  messed up everything).
>>>>>>>
>>>>>>> MPI jobs can now run with qrsh... CPU time displayed by  
>>>>>>> "qstat  -  ext" is no longer 0... but it corresponds to a  
>>>>>>> single cpu!
>>>>>>>
>>>>>>> 2218 0.24170 0.24169 Test_abini minet         
>>>>>>> NA                  grppcpm    r 0:00:12:09 205.41073 0.00000  
>>>>>>> 74727     0     0   71428   3298 0.04   
>>>>>>> all.q at lmexec-88                    2
>>>>>>>
>>>>>>> This job started about 12 minutes earlier, and runs on 2   
>>>>>>> cpus.    Shouldn't the displayed "cpu" be the sum of all cpu   
>>>>>>> times or is   this the correct behavior?
>>>>>>>
>>>>>>> Thks for your input
>>>>>>>
>>>>>> is "qstat -j 2218" giving you more reasonable results in the    
>>>>>> "usage  1:" line? As "qstat -g t -ext" will also display the   
>>>>>> CPU  time for  slave processes, these should be per process.  
>>>>>> -  Reuti
>>>>>>
>>>>>>> jp
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -- -- -
>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-  
>>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> -- --
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-  
>>>>>> help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Jean-Paul Minet
>>>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage   
>>>>> de  Masse
>>>>> Université Catholique de Louvain
>>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> -- -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>
>>>
>>> -- 
>>> Jean-Paul Minet
>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage  
>>> de  Masse
>>> Université Catholique de Louvain
>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> -- 
> Jean-Paul Minet
> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de  
> Masse
> Université Catholique de Louvain
> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list