[GE users] tight intergration problem

Reuti reuti at staff.uni-marburg.de
Thu Jan 26 09:01:15 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 26.01.2006 um 09:47 schrieb Jean-Paul Minet:

> Reuti,
>
>> In short: don't use -nolocal! This will exclude the starting node   
>> from all starts of a MPICH process and will lead to an uneven  
>> process  distribution. With SGE looking at the number of issued  
>> qrsh's, this  might break the complete setup. And what you see, is  
>> that a slave  node is becoming the new master of the MPICH  
>> program. Looks okay, but  of course wrong for the Tight Integration.
>
> The fact that -nolocal was included in the submission script is  
> somehow historical.  It's indeed clear to me that the -nolocal  
> switch is not desirable in order for the mpich "head-node" to do  
> indeed some work.  But, during my "troubleshooting" of the qrsh  
> wrapping scheme, it coincidentally happened that, while the - 
> nolocal switch was there, I could see some qrsh issued (and  
> therefore could conlcude that the rsh redirection through the / 
> tmp/... directory and PATH fiddling was working).  I then wanted to  
> remove this offending -nolocal switch, but then any reference to  
> qrsh disappears... (see below for ps -ef output)!
>
>> What do you mean by master-node? The master-node of the cluster  
>> or  the "head-node" of the parallel job? With MPICH, also the  
>> "head-node"  of the parallel job (which is actually a conventional  
>> exec node in  the cluster) will do some work (therefore don't use - 
>> nolocal). This  is the one you see in the "qstat" output (or named  
>> "MASTER" with  "qstat -g t").
>
> I indeed meant the SGE MASTER, which is the mpich head node.  No  
> intention at all to deliberately use -nolocal (just by accident...)
>
>> Can you please post the "ps -e f" output when not using -nolocal?  
>> You  checked also in the default ouput file of the PE (.po), that  
>> the  hostnames listed there, are the ones which you get by the  
>> command  "hostname" on the nodes? Otherwise MPICH might fail to  
>> subtract one  process on the "head-node" of the parallel job.
>>
>> Using the Tight Integration, also the normal user can just use  
>> qrsh.  So, I'd suggest to submit a small script with the proper - 
>> pe request:
>> #!/bin/sh
>> cat $TMPDIR/machines
>> sleep 120
>> and check on the head-node of the parallel job, that the link to  
>> the  rsh wrapper was created in the intended way in $TMPDIR. Is  
>> $SGE_ROOT  mounted nosuid on the nodes? This might explain that  
>> only root can do  it.
>
> Before listing the ps -ef output, let me confirm that I checked  
> that, on the mpich head-node, the /tmp/job n°/rsh symbolic link is  
> there pointing to the /sge/gridware/mpi/bin/rsh, and, in the same  
> temporary directory, the machine file seems correct (also confirmed  
> by the .po output file).  As far as qrsh is concerned, root can use  
> it interactively, but I just realize now that normal users get the  
> same error.  SGE was installed (by Sun on delivery of cluster) on  
> each node (no remote mount) and this seems not right to me.  I  
> would prefer an NFS mount.  You mentioned earlier that no SUID is  
> required on specific SGE binaries, but in SGE HowTo's (

Correct, my remark was a question, whether it was mounted by accident  
with nosuid, which would prevent the SUID bit to take effect and so  
won't work. Sorry for the confusion.

> http://gridengine.sunsource.net/howto/ 
> commonproblems.html#interactive), there is some mention of SUID for  
> utilbin rlogin and rsh.  Anyway, I followed that, and I still get  
> the error:
>
> qrsh -verbose -l mem_free=10M -l num_proc=2 -q all.q at lmexec-100 date
> your job 2402 ("date") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 2402 has been successfully scheduled.
> Establishing /gridware/sge/utilbin/lx24-amd64/rsh session to host  
> lmexec-100 ...
> rcmd: socket: Permission denied
>

Mmh, okay let's first look at the wrong $PATH. Can you put a "echo  
$PATH" in your jobscript? Usually the $TMPDIR is the first one in the  
created $PATH. If you change the $PATH, the $TMPDIR must again be put  
in the first place to access the wrapper.

Another thing: what rsh statement is compiled in?

strings abinip_eth | egrep "(rsh|ssh)"

-- Reuti


> on lmexec-100, I have:
> -rwxr-xr-x  1 sgeadmin root  194380 Jul 22  2005 qrsh_starter
> -r-sr-xr-x  1 root     root   32607 Jul 22  2005 rlogin
> -r-sr-xr-x  1 root     root   22180 Jul 22  2005 rsh
> -rwxr-xr-x  1 sgeadmin root  218778 Jul 22  2005 rshd
>
> Here is the ps -ef output on the head node for a mpich job (np=2)  
> when -nolocal is not used:
>
> sgeadmin  1181  3492  0 08:43 ?        00:00:00 sge_shepherd-2404 -bg
> minet     1205  1181  0 08:43 ?        00:00:00 bash /var/spool/sge/ 
> lmexec-94/job_scripts/2404
> minet     1208  1205  0 08:43 ?        00:00:00 /bin/sh /usr/local/ 
> mpich-eth-intel/bin/mpirun -np 2 -machinefile /tmp/2404.1.all.q/ 
> machines abinip_e
> minet     1292  1208  0 08:43 ?        00:00:12 /home/pan/minet/ 
> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/ 
> parallel_eth/PI1208 -p4w
> minet     1293  1292  0 08:43 ?        00:00:00 /home/pan/minet/ 
> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/ 
> parallel_eth/PI1208 -p4w
> minet     1294  1292  0 08:43 ?        00:00:00 /usr/bin/rsh  
> lmexec-72 -l minet -n /home/pan/minet/abinit/parallel_eth/ 
> abinip_eth lmexec-94 32858 \-
>
> We indeed have the node doing some work... but the connexion to the  
> "slave" is through /usr/bin/rsh instead of qrsh.  On this head  
> node, we have also:
>
> lmexec-94 /tmp/2404.1.all.q # ls -al
> total 12
> drwxr-xr-x  2 minet grppcpm 4096 Jan 26 08:42 .
> drwxrwxrwt  9 root  root    4096 Jan 26 08:42 ..
> -rw-r--r--  1 minet grppcpm   20 Jan 26 08:42 machines
> lrwxrwxrwx  1 minet grppcpm   21 Jan 26 08:42 rsh -> /gridware/sge/ 
> mpi/rsh
>
> and also:
> lmexec-94 /tmp/2404.1.all.q # cat machines
> lmexec-94
> lmexec-72
>
> but somehow, the rsh wrapping mechanism get bypassed somewhere...
>
> I hope I have been clear in my replies so that it helps you to help  
> me ;-)
>
> Thanks again for your support
>
> Jean-Paul
>
>> HTH  -Reuti
>> Am 25.01.2006 um 15:42 schrieb Jean-Paul Minet:
>>> Reuti,
>>>
>>> I am totally lost with this tight intergration...
>>>
>>> 1) as root user, if I use the -nolocal flag as mpirun argument,  
>>> I  end up with the following process on the "master node":
>>>
>>> root      5349  5326  0 14:27 ?        00:00:00 bash /var/spool/ 
>>> sge/ lmexec-121/job_scripts/2375
>>> root      5352  5349  0 14:27 ?        00:00:00 /bin/sh /usr/ 
>>> local/ mpich-eth-intel/bin/mpirun -nolocal -np 2 -machinefile / 
>>> tmp/ 2375.1.all.q/machines abinip_eth
>>> root      5446  5352  0 14:27 ?        00:00:00 /gridware/sge/ 
>>> bin/ lx24-amd64/qrsh -inherit lmexec-62 /home/pan/minet/abinit/  
>>> parallel_eth/abinip_eth -p4pg /home/
>>> root      5454  5446  0 14:27 ?        00:00:00 /gridware/sge/  
>>> utilbin/lx24-amd64/rsh -p 32816 lmexec-62 exec '/gridware/sge/  
>>> utilbin/lx24-amd64/qrsh_starter' '/v
>>> root      5455  5454  0 14:27 ?        00:00:00 [rsh] <defunct>
>>>
>>> and on the slave node:
>>>
>>> sgeadmin 14300  3464  0 14:27 ?        00:00:00 sge_shepherd-2375  
>>> -bg
>>> root     14301 14300  0 14:27 ?        00:00:00 /gridware/sge/  
>>> utilbin/lx24-amd64/rshd -l
>>> root     14302 14301  0 14:27 ?        00:00:00 /gridware/sge/  
>>> utilbin/lx24-amd64/qrsh_starter /var/spool/sge/lmexec-62/  
>>> active_jobs/2375.1/1.lmexec-62
>>> root     14303 14302 27 14:27 ?        00:00:31 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>> root     14304 14303  0 14:27 ?        00:00:00 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>> root     14305 14303  0 14:27 ?        00:00:00 /usr/bin/rsh   
>>> lmexec-62 -l root -n /home/pan/minet/abinit/parallel_eth/ 
>>> abinip_eth  lmexec-62 32819 \-p4amslave \-p4yourname lmexec-62
>>> root     14306  3995  0 14:27 ?        00:00:00 in.rshd -aL
>>> root     14307 14306 86 14:27 ?        00:01:38 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth lmexec-62 32819 -p4amslave -  
>>> p4yourname lmexec-62 -p4rmrank 1
>>> root     14357 14307  0 14:27 ?        00:00:00 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth lmexec-62 32819 -p4amslave -  
>>> p4yourname lmexec-62 -p4rmrank 1
>>>
>>> So, I can see that, in a way, the SGE qrsh/rsh/qrsh_starter are   
>>> coming into play ans that the sge_shepherd is initiating remote   
>>> process.  Nevertheless:
>>>
>>> - as expected, there is no local instance of the program run on  
>>> the  master node, which is not what we want.
>>> - the slave node issues a rsh onto itself, is that expected ?
>>>
>>> Under these conditions, qstat -ext reports 0 usage (cpu/mem).
>>>
>>> If I don't use this -nolocal flag, then the rsh/qrsh wrapper   
>>> mechanism doesn't seem to come into play, and the master node  
>>> does  direct rsh to the slave node. In these conditions, the  
>>> qstat -ext  reports cpu time (from a single process, which is  
>>> also expected  since there is no SGE control in this case).
>>>
>>> All in all, I don't see how this -nolocal flag can make the rsh   
>>> wrapper appear to work or fail.
>>>
>>> 2) as non root user, the first scenario doesn't work as I get an   
>>> "error:rcmd: permission denied".  Second scenario work as for  
>>> root  user.
>>>
>>> Quite a bit lost...
>>>
>>> Jean-Paul
>>>
>>> Reuti wrote:
>>>
>>>> Hi Jean-Paul,
>>>> Am 23.01.2006 um 14:31 schrieb Jean-Paul Minet:
>>>>
>>>>> Reuti,
>>>>>
>>>>>> for using qrsh the /etc/hosts.equiv isn't necessary. I set  
>>>>>> this   just  to reflect the login node on all exec nodes to  
>>>>>> allow   interactive qrsh/ qlogin sessions.
>>>>>
>>>>>
>>>>>
>>>>> OK, got this.
>>>>>
>>>>>> As qrsh will use a chosen port: any firewall and/or etc/ 
>>>>>> hosts.  (allow| deny) configured? - Reuti
>>>>>
>>>>>
>>>>>
>>>>> No firewall nor hosts.xxx.  The problem was from wrong mode  
>>>>> set  on  rsh/rlogin on exec nodes (I had played with those  
>>>>> following  some  hints for qrsh problem solving on the SGE FAQ,  
>>>>> which  probably  messed up everything).
>>>>>
>>>>> MPI jobs can now run with qrsh... CPU time displayed by "qstat  
>>>>> -  ext" is no longer 0... but it corresponds to a single cpu!
>>>>>
>>>>> 2218 0.24170 0.24169 Test_abini minet        NA                  
>>>>> grppcpm    r 0:00:12:09 205.41073 0.00000 74727     0     0   
>>>>> 71428   3298 0.04  all.q at lmexec-88                    2
>>>>>
>>>>> This job started about 12 minutes earlier, and runs on 2  
>>>>> cpus.    Shouldn't the displayed "cpu" be the sum of all cpu  
>>>>> times or is   this the correct behavior?
>>>>>
>>>>> Thks for your input
>>>>>
>>>> is "qstat -j 2218" giving you more reasonable results in the   
>>>> "usage  1:" line? As "qstat -g t -ext" will also display the  
>>>> CPU  time for  slave processes, these should be per process. -  
>>>> Reuti
>>>>
>>>>> jp
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> -- -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>
>>>
>>> -- 
>>> Jean-Paul Minet
>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage  
>>> de  Masse
>>> Université Catholique de Louvain
>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> -- 
> Jean-Paul Minet
> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de  
> Masse
> Université Catholique de Louvain
> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list