[GE users] tight intergration problem

Jean-Paul Minet minet at cism.ucl.ac.be
Thu Jan 26 10:53:18 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti,

>> Maybe we should re-install mpich with the proper rsh (i.e. without  
>> the full path)... still this wouldn't explain why the rsh wrapping  
>> works under certain conditions with the same user binary code.
>>
> 
> Well, this would mean also to recompile - or at least relink - your  
> application, as the .a libs are already in the executable. - Reuti

yes, I have started doing that and coming back to the list with results...

What about the "persmission denied" error for the qrsh bit for normal users. 
Besides SUID on utilbin/rsh and /utilbin/rlogin, other requirements?  I have 
seen the utilbin/testsuidroot program.  Would that be a useful testing tool?

What's the sequence of events (with qrsh, qrsh_starter, sge_shepherd,..) 
happening when a qrsh is launched?

thks again for your support

jp

>> jp
>>
>> Reuti wrote:
>>
>>> Hi,
>>> Am 26.01.2006 um 09:47 schrieb Jean-Paul Minet:
>>>
>>>> Reuti,
>>>>
>>>>> In short: don't use -nolocal! This will exclude the starting  
>>>>> node   from all starts of a MPICH process and will lead to an  
>>>>> uneven  process  distribution. With SGE looking at the number of  
>>>>> issued  qrsh's, this  might break the complete setup. And what  you 
>>>>> see, is  that a slave  node is becoming the new master of  the 
>>>>> MPICH  program. Looks okay, but  of course wrong for the  Tight 
>>>>> Integration.
>>>>
>>>>
>>>>
>>>> The fact that -nolocal was included in the submission script is   
>>>> somehow historical.  It's indeed clear to me that the -nolocal   
>>>> switch is not desirable in order for the mpich "head-node" to do   
>>>> indeed some work.  But, during my "troubleshooting" of the qrsh   
>>>> wrapping scheme, it coincidentally happened that, while the -  
>>>> nolocal switch was there, I could see some qrsh issued (and   
>>>> therefore could conlcude that the rsh redirection through the /  
>>>> tmp/... directory and PATH fiddling was working).  I then wanted  
>>>> to  remove this offending -nolocal switch, but then any reference  
>>>> to  qrsh disappears... (see below for ps -ef output)!
>>>>
>>>>> What do you mean by master-node? The master-node of the cluster   
>>>>> or  the "head-node" of the parallel job? With MPICH, also the   
>>>>> "head-node"  of the parallel job (which is actually a  
>>>>> conventional  exec node in  the cluster) will do some work  
>>>>> (therefore don't use - nolocal). This  is the one you see in the  
>>>>> "qstat" output (or named  "MASTER" with  "qstat -g t").
>>>>
>>>>
>>>>
>>>> I indeed meant the SGE MASTER, which is the mpich head node.  No   
>>>> intention at all to deliberately use -nolocal (just by accident...)
>>>>
>>>>> Can you please post the "ps -e f" output when not using - nolocal?  
>>>>> You  checked also in the default ouput file of the PE  (.po), that  
>>>>> the  hostnames listed there, are the ones which you  get by the  
>>>>> command  "hostname" on the nodes? Otherwise MPICH  might fail to  
>>>>> subtract one  process on the "head-node" of the  parallel job.
>>>>>
>>>>> Using the Tight Integration, also the normal user can just use   
>>>>> qrsh.  So, I'd suggest to submit a small script with the proper  - 
>>>>> pe request:
>>>>> #!/bin/sh
>>>>> cat $TMPDIR/machines
>>>>> sleep 120
>>>>> and check on the head-node of the parallel job, that the link  to  
>>>>> the  rsh wrapper was created in the intended way in $TMPDIR.  Is  
>>>>> $SGE_ROOT  mounted nosuid on the nodes? This might explain  that  
>>>>> only root can do  it.
>>>>
>>>>
>>>>
>>>> Before listing the ps -ef output, let me confirm that I checked   
>>>> that, on the mpich head-node, the /tmp/job n°/rsh symbolic link  is  
>>>> there pointing to the /sge/gridware/mpi/bin/rsh, and, in the  same  
>>>> temporary directory, the machine file seems correct (also  
>>>> confirmed  by the .po output file).  As far as qrsh is concerned,  
>>>> root can use  it interactively, but I just realize now that  normal 
>>>> users get the  same error.  SGE was installed (by Sun on  delivery 
>>>> of cluster) on  each node (no remote mount) and this  seems not 
>>>> right to me.  I  would prefer an NFS mount.  You  mentioned earlier 
>>>> that no SUID is  required on specific SGE  binaries, but in SGE 
>>>> HowTo's (
>>>
>>> Correct, my remark was a question, whether it was mounted by  
>>> accident  with nosuid, which would prevent the SUID bit to take  
>>> effect and so  won't work. Sorry for the confusion.
>>>
>>>> http://gridengine.sunsource.net/howto/  
>>>> commonproblems.html#interactive), there is some mention of SUID  
>>>> for  utilbin rlogin and rsh.  Anyway, I followed that, and I  still 
>>>> get  the error:
>>>>
>>>> qrsh -verbose -l mem_free=10M -l num_proc=2 -q all.q at lmexec-100 date
>>>> your job 2402 ("date") has been submitted
>>>> waiting for interactive job to be scheduled ...
>>>> Your interactive job 2402 has been successfully scheduled.
>>>> Establishing /gridware/sge/utilbin/lx24-amd64/rsh session to  host  
>>>> lmexec-100 ...
>>>> rcmd: socket: Permission denied
>>>>
>>> Mmh, okay let's first look at the wrong $PATH. Can you put a  "echo  
>>> $PATH" in your jobscript? Usually the $TMPDIR is the first  one in 
>>> the  created $PATH. If you change the $PATH, the $TMPDIR  must again 
>>> be put  in the first place to access the wrapper.
>>> Another thing: what rsh statement is compiled in?
>>> strings abinip_eth | egrep "(rsh|ssh)"
>>> -- Reuti
>>>
>>>> on lmexec-100, I have:
>>>> -rwxr-xr-x  1 sgeadmin root  194380 Jul 22  2005 qrsh_starter
>>>> -r-sr-xr-x  1 root     root   32607 Jul 22  2005 rlogin
>>>> -r-sr-xr-x  1 root     root   22180 Jul 22  2005 rsh
>>>> -rwxr-xr-x  1 sgeadmin root  218778 Jul 22  2005 rshd
>>>>
>>>> Here is the ps -ef output on the head node for a mpich job  (np=2)  
>>>> when -nolocal is not used:
>>>>
>>>> sgeadmin  1181  3492  0 08:43 ?        00:00:00 sge_shepherd-2404  -bg
>>>> minet     1205  1181  0 08:43 ?        00:00:00 bash /var/spool/ 
>>>> sge/ lmexec-94/job_scripts/2404
>>>> minet     1208  1205  0 08:43 ?        00:00:00 /bin/sh /usr/ local/ 
>>>> mpich-eth-intel/bin/mpirun -np 2 -machinefile /tmp/ 2404.1.all.q/ 
>>>> machines abinip_e
>>>> minet     1292  1208  0 08:43 ?        00:00:12 /home/pan/minet/  
>>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>>> parallel_eth/PI1208 -p4w
>>>> minet     1293  1292  0 08:43 ?        00:00:00 /home/pan/minet/  
>>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>>> parallel_eth/PI1208 -p4w
>>>> minet     1294  1292  0 08:43 ?        00:00:00 /usr/bin/rsh   
>>>> lmexec-72 -l minet -n /home/pan/minet/abinit/parallel_eth/  
>>>> abinip_eth lmexec-94 32858 \-
>>>>
>>>> We indeed have the node doing some work... but the connexion to  
>>>> the  "slave" is through /usr/bin/rsh instead of qrsh.  On this  
>>>> head  node, we have also:
>>>>
>>>> lmexec-94 /tmp/2404.1.all.q # ls -al
>>>> total 12
>>>> drwxr-xr-x  2 minet grppcpm 4096 Jan 26 08:42 .
>>>> drwxrwxrwt  9 root  root    4096 Jan 26 08:42 ..
>>>> -rw-r--r--  1 minet grppcpm   20 Jan 26 08:42 machines
>>>> lrwxrwxrwx  1 minet grppcpm   21 Jan 26 08:42 rsh -> /gridware/ sge/ 
>>>> mpi/rsh
>>>>
>>>> and also:
>>>> lmexec-94 /tmp/2404.1.all.q # cat machines
>>>> lmexec-94
>>>> lmexec-72
>>>>
>>>> but somehow, the rsh wrapping mechanism get bypassed somewhere...
>>>>
>>>> I hope I have been clear in my replies so that it helps you to  
>>>> help  me ;-)
>>>>
>>>> Thanks again for your support
>>>>
>>>> Jean-Paul
>>>>
>>>>> HTH  -Reuti
>>>>> Am 25.01.2006 um 15:42 schrieb Jean-Paul Minet:
>>>>>
>>>>>> Reuti,
>>>>>>
>>>>>> I am totally lost with this tight intergration...
>>>>>>
>>>>>> 1) as root user, if I use the -nolocal flag as mpirun  argument,  
>>>>>> I  end up with the following process on the "master  node":
>>>>>>
>>>>>> root      5349  5326  0 14:27 ?        00:00:00 bash /var/ spool/ 
>>>>>> sge/ lmexec-121/job_scripts/2375
>>>>>> root      5352  5349  0 14:27 ?        00:00:00 /bin/sh /usr/  
>>>>>> local/ mpich-eth-intel/bin/mpirun -nolocal -np 2 -machinefile /  
>>>>>> tmp/ 2375.1.all.q/machines abinip_eth
>>>>>> root      5446  5352  0 14:27 ?        00:00:00 /gridware/sge/  
>>>>>> bin/ lx24-amd64/qrsh -inherit lmexec-62 /home/pan/minet/ abinit/  
>>>>>> parallel_eth/abinip_eth -p4pg /home/
>>>>>> root      5454  5446  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>>> utilbin/lx24-amd64/rsh -p 32816 lmexec-62 exec '/gridware/sge/   
>>>>>> utilbin/lx24-amd64/qrsh_starter' '/v
>>>>>> root      5455  5454  0 14:27 ?        00:00:00 [rsh] <defunct>
>>>>>>
>>>>>> and on the slave node:
>>>>>>
>>>>>> sgeadmin 14300  3464  0 14:27 ?        00:00:00  
>>>>>> sge_shepherd-2375  -bg
>>>>>> root     14301 14300  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>>> utilbin/lx24-amd64/rshd -l
>>>>>> root     14302 14301  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>>> utilbin/lx24-amd64/qrsh_starter /var/spool/sge/lmexec-62/   
>>>>>> active_jobs/2375.1/1.lmexec-62
>>>>>> root     14303 14302 27 14:27 ?        00:00:31 /home/pan/ minet/  
>>>>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/ abinit/  
>>>>>> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>>>> root     14304 14303  0 14:27 ?        00:00:00 /home/pan/ minet/  
>>>>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/ abinit/  
>>>>>> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>>>> root     14305 14303  0 14:27 ?        00:00:00 /usr/bin/rsh    
>>>>>> lmexec-62 -l root -n /home/pan/minet/abinit/parallel_eth/  
>>>>>> abinip_eth  lmexec-62 32819 \-p4amslave \-p4yourname lmexec-62
>>>>>> root     14306  3995  0 14:27 ?        00:00:00 in.rshd -aL
>>>>>> root     14307 14306 86 14:27 ?        00:01:38 /home/pan/ minet/  
>>>>>> abinit/parallel_eth/abinip_eth lmexec-62 32819 - p4amslave -  
>>>>>> p4yourname lmexec-62 -p4rmrank 1
>>>>>> root     14357 14307  0 14:27 ?        00:00:00 /home/pan/ minet/  
>>>>>> abinit/parallel_eth/abinip_eth lmexec-62 32819 - p4amslave -  
>>>>>> p4yourname lmexec-62 -p4rmrank 1
>>>>>>
>>>>>> So, I can see that, in a way, the SGE qrsh/rsh/qrsh_starter  are   
>>>>>> coming into play ans that the sge_shepherd is initiating  remote   
>>>>>> process.  Nevertheless:
>>>>>>
>>>>>> - as expected, there is no local instance of the program run  on  
>>>>>> the  master node, which is not what we want.
>>>>>> - the slave node issues a rsh onto itself, is that expected ?
>>>>>>
>>>>>> Under these conditions, qstat -ext reports 0 usage (cpu/mem).
>>>>>>
>>>>>> If I don't use this -nolocal flag, then the rsh/qrsh wrapper    
>>>>>> mechanism doesn't seem to come into play, and the master node   
>>>>>> does  direct rsh to the slave node. In these conditions, the   
>>>>>> qstat -ext  reports cpu time (from a single process, which is   
>>>>>> also expected  since there is no SGE control in this case).
>>>>>>
>>>>>> All in all, I don't see how this -nolocal flag can make the  rsh   
>>>>>> wrapper appear to work or fail.
>>>>>>
>>>>>> 2) as non root user, the first scenario doesn't work as I get  
>>>>>> an   "error:rcmd: permission denied".  Second scenario work as  
>>>>>> for  root  user.
>>>>>>
>>>>>> Quite a bit lost...
>>>>>>
>>>>>> Jean-Paul
>>>>>>
>>>>>> Reuti wrote:
>>>>>>
>>>>>>> Hi Jean-Paul,
>>>>>>> Am 23.01.2006 um 14:31 schrieb Jean-Paul Minet:
>>>>>>>
>>>>>>>> Reuti,
>>>>>>>>
>>>>>>>>> for using qrsh the /etc/hosts.equiv isn't necessary. I set   
>>>>>>>>> this   just  to reflect the login node on all exec nodes to   
>>>>>>>>> allow   interactive qrsh/ qlogin sessions.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> OK, got this.
>>>>>>>>
>>>>>>>>> As qrsh will use a chosen port: any firewall and/or etc/  
>>>>>>>>> hosts.  (allow| deny) configured? - Reuti
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> No firewall nor hosts.xxx.  The problem was from wrong mode   
>>>>>>>> set  on  rsh/rlogin on exec nodes (I had played with those   
>>>>>>>> following  some  hints for qrsh problem solving on the SGE  
>>>>>>>> FAQ,  which  probably  messed up everything).
>>>>>>>>
>>>>>>>> MPI jobs can now run with qrsh... CPU time displayed by  "qstat  
>>>>>>>> -  ext" is no longer 0... but it corresponds to a  single cpu!
>>>>>>>>
>>>>>>>> 2218 0.24170 0.24169 Test_abini minet         
>>>>>>>> NA                  grppcpm    r 0:00:12:09 205.41073 0.00000  
>>>>>>>> 74727     0     0   71428   3298 0.04   
>>>>>>>> all.q at lmexec-88                    2
>>>>>>>>
>>>>>>>> This job started about 12 minutes earlier, and runs on 2   
>>>>>>>> cpus.    Shouldn't the displayed "cpu" be the sum of all cpu   
>>>>>>>> times or is   this the correct behavior?
>>>>>>>>
>>>>>>>> Thks for your input
>>>>>>>>
>>>>>>> is "qstat -j 2218" giving you more reasonable results in the    
>>>>>>> "usage  1:" line? As "qstat -g t -ext" will also display the   
>>>>>>> CPU  time for  slave processes, these should be per process.  -  
>>>>>>> Reuti
>>>>>>>
>>>>>>>> jp
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>> -- -- -
>>>>>>>> To unsubscribe, e-mail: users- unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail: users-  
>>>>>>>> help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------- 
>>>>>>> -- --
>>>>>>> To unsubscribe, e-mail: users- unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-  
>>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Jean-Paul Minet
>>>>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage   
>>>>>> de  Masse
>>>>>> Université Catholique de Louvain
>>>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>>>
>>>>>> ------------------------------------------------------------------ 
>>>>>> -- -
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users- help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- --
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>> -- 
>>>> Jean-Paul Minet
>>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage  de  
>>>> Masse
>>>> Université Catholique de Louvain
>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>
>>>> -------------------------------------------------------------------- -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> -- 
>> Jean-Paul Minet
>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de  Masse
>> Université Catholique de Louvain
>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 
> 

-- 
Jean-Paul Minet
Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
Université Catholique de Louvain
Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list