[GE users] tight intergration problem

Reuti reuti at staff.uni-marburg.de
Thu Jan 26 10:30:28 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 26.01.2006 um 10:20 schrieb Jean-Paul Minet:

> Reuti,
>
>> Correct, my remark was a question, whether it was mounted by  
>> accident  with nosuid, which would prevent the SUID bit to take  
>> effect and so  won't work. Sorry for the confusion.
>>> http://gridengine.sunsource.net/howto/  
>>> commonproblems.html#interactive), there is some mention of SUID  
>>> for  utilbin rlogin and rsh.  Anyway, I followed that, and I  
>>> still get  the error:
>>>
>>> qrsh -verbose -l mem_free=10M -l num_proc=2 -q all.q at lmexec-100 date
>>> your job 2402 ("date") has been submitted
>>> waiting for interactive job to be scheduled ...
>>> Your interactive job 2402 has been successfully scheduled.
>>> Establishing /gridware/sge/utilbin/lx24-amd64/rsh session to  
>>> host  lmexec-100 ...
>>> rcmd: socket: Permission denied
>>>
>> Mmh, okay let's first look at the wrong $PATH. Can you put a  
>> "echo  $PATH" in your jobscript? Usually the $TMPDIR is the first  
>> one in the  created $PATH. If you change the $PATH, the $TMPDIR  
>> must again be put  in the first place to access the wrapper.
>
> I had done that already.  Here is the output of the .o file for the  
> same job. And I hed seen that the PATH is correct (the /tmp/...  
> comes first).  What is weird, is that this "buggy" behavior  
> disappears if the -nolocal switch (even if unappropriate/ 
> undesirable) is added in the mpirun command.  So maybe the problem  
> comes from mpich?
>
> minet at lemaitre ~/abinit/parallel_eth >cat Test_abinip.o2404
> Environment:
> -------------
> ARC =  lx24-amd64
> SGE_ROOT =  /gridware/sge
> SGE_BINARY_PATH =  /gridware/sge/bin/lx24-amd64
> SGE_CELL =  default
> SGE_JOB_SPOOL_DIR =  /var/spool/sge/lmexec-94/active_jobs/2404.1
> SGE_O_HOME =  /home/pan/minet
> SGE_O_LOGNAME =  minet
> SGE_O_MAIL =
> SGE_O_PATH = /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/ 
> games:/opt/gnome/bin:/opt/kde3/bin:/usr/pgi/linux86-64/6.0/bin:/mnt/ 
> optlm/bin:/home/pan/minet/bin:.:/opt/intel/cce/9.0/bin:/opt/intel/ 
> fce/9.0/bin:/gridware/sge/bin/lx24-amd64
> SGE_O_HOST =  lemaitre
> SGE_O_SHELL =  /bin/bash
> SGE_O_WORKDIR =  /home/pan/minet/abinit/parallel_eth
> SGE_STDERR_PATH = /home/pan/minet/abinit/parallel_eth/ 
> Test_abinip.e2404
> SGE_STDOUT_PATH = /home/pan/minet/abinit/parallel_eth/ 
> Test_abinip.o2404
> HOME =  /home/pan/minet
> NHOSTS =  2
> NQUEUES =  2
> PATH =  /tmp/2404.1.all.q:/usr/local/bin:/usr/ucb:/bin:/usr/bin:
> PE_HOSTFILE =  /var/spool/sge/lmexec-94/active_jobs/2404.1/pe_hostfile
> QUEUE =  all.q
> TMPDIR =  /tmp/2404.1.all.q
> Got 2 slots.
> Temp dir is /tmp/2404.1.all.q
> Node file is:
> lmexec-94
> lmexec-72
>

looks okay.

>> Another thing: what rsh statement is compiled in?
>
> Compiled in what?  in .../bin/mpirun on each node, we have:
>
> RSHCOMMAND="rsh"
>

You can specify during ./configure if you compile MPICH on your own,  
which statement should be in the libraries compiled in as default.  
The RSHCOMMAND is used during ./configure as a replacement of the  
option -rsh=rsh (which I still prefer, as the complete commandline to  
configure will be also in the libs and so document how it was built -  
the RSHCOMMAND will not show up there).

But back to your issue: the RSHCOMMAND is only used for the ch_p4  
device, if you enable the setting I think (when looking into  
mpirun.ch_p4.args). More suited is the option:

P4_RSHCOMMAND="rsh"

in your case.

>> strings abinip_eth | egrep "(rsh|ssh)"
>
> In which way would the user mpich code use rsh to do something?  I  
> indeed see the following:
>
> minet at lemaitre ~/abinit/parallel_eth >strings abinip_eth | egrep  
> "(rsh|ssh)"
>  rshift(icenter) =
>  Shapefunction is SIN type: shapef(r)=[sin(pi*r/rshp)/(pi*r/rshp)]**2
> rsh program (Some versions of Kerberos rsh have been observed to  
> have this
> /usr/bin/rsh
>
> Also, how would the same user code let the rsh wrapping process  
> work when the -nolocal switch is added.
>

Please have a look into "mpirun.ch_p4" lines 229 ff. In case of  
"nolocal" the $rshcmd will be distributed to the first MPI process as  
to be used rsh command. This is special only in this case.

> Now, thinking aloud... as for SGE, the /usr/local/mpich stuff is  
> local to each node (no NFS mount)... and that's the only way I  
> could see a possible link somewhere between the failure of the rsh  
> wrapping mechanism when this unneeded -nolocal switch is  
> removed...  some environment variable could be different, some  
> mpich flag or...?  Again, wouldn't it be better+more practical/more  
> coherent to have this mpich directory NFS exported on all nodes
>

Both should work okay. It's just a matter of personal taste, or  
whether you frequently rebuilt the libs and want the changes to show  
up on all nodes at once.

-- Reuti

> jp
>
>> -- Reuti
>>> on lmexec-100, I have:
>>> -rwxr-xr-x  1 sgeadmin root  194380 Jul 22  2005 qrsh_starter
>>> -r-sr-xr-x  1 root     root   32607 Jul 22  2005 rlogin
>>> -r-sr-xr-x  1 root     root   22180 Jul 22  2005 rsh
>>> -rwxr-xr-x  1 sgeadmin root  218778 Jul 22  2005 rshd
>>>
>>> Here is the ps -ef output on the head node for a mpich job  
>>> (np=2)  when -nolocal is not used:
>>>
>>> sgeadmin  1181  3492  0 08:43 ?        00:00:00 sge_shepherd-2404  
>>> -bg
>>> minet     1205  1181  0 08:43 ?        00:00:00 bash /var/spool/ 
>>> sge/ lmexec-94/job_scripts/2404
>>> minet     1208  1205  0 08:43 ?        00:00:00 /bin/sh /usr/ 
>>> local/ mpich-eth-intel/bin/mpirun -np 2 -machinefile /tmp/ 
>>> 2404.1.all.q/ machines abinip_e
>>> minet     1292  1208  0 08:43 ?        00:00:12 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>> parallel_eth/PI1208 -p4w
>>> minet     1293  1292  0 08:43 ?        00:00:00 /home/pan/minet/  
>>> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/  
>>> parallel_eth/PI1208 -p4w
>>> minet     1294  1292  0 08:43 ?        00:00:00 /usr/bin/rsh   
>>> lmexec-72 -l minet -n /home/pan/minet/abinit/parallel_eth/  
>>> abinip_eth lmexec-94 32858 \-
>>>
>>> We indeed have the node doing some work... but the connexion to  
>>> the  "slave" is through /usr/bin/rsh instead of qrsh.  On this  
>>> head  node, we have also:
>>>
>>> lmexec-94 /tmp/2404.1.all.q # ls -al
>>> total 12
>>> drwxr-xr-x  2 minet grppcpm 4096 Jan 26 08:42 .
>>> drwxrwxrwt  9 root  root    4096 Jan 26 08:42 ..
>>> -rw-r--r--  1 minet grppcpm   20 Jan 26 08:42 machines
>>> lrwxrwxrwx  1 minet grppcpm   21 Jan 26 08:42 rsh -> /gridware/ 
>>> sge/ mpi/rsh
>>>
>>> and also:
>>> lmexec-94 /tmp/2404.1.all.q # cat machines
>>> lmexec-94
>>> lmexec-72
>>>
>>> but somehow, the rsh wrapping mechanism get bypassed somewhere...
>>>
>>> I hope I have been clear in my replies so that it helps you to  
>>> help  me ;-)
>>>
>>> Thanks again for your support
>>>
>>> Jean-Paul
>>>
>>>> HTH  -Reuti
>>>> Am 25.01.2006 um 15:42 schrieb Jean-Paul Minet:
>>>>
>>>>> Reuti,
>>>>>
>>>>> I am totally lost with this tight intergration...
>>>>>
>>>>> 1) as root user, if I use the -nolocal flag as mpirun  
>>>>> argument,  I  end up with the following process on the "master  
>>>>> node":
>>>>>
>>>>> root      5349  5326  0 14:27 ?        00:00:00 bash /var/ 
>>>>> spool/ sge/ lmexec-121/job_scripts/2375
>>>>> root      5352  5349  0 14:27 ?        00:00:00 /bin/sh /usr/  
>>>>> local/ mpich-eth-intel/bin/mpirun -nolocal -np 2 -machinefile /  
>>>>> tmp/ 2375.1.all.q/machines abinip_eth
>>>>> root      5446  5352  0 14:27 ?        00:00:00 /gridware/sge/  
>>>>> bin/ lx24-amd64/qrsh -inherit lmexec-62 /home/pan/minet/ 
>>>>> abinit/  parallel_eth/abinip_eth -p4pg /home/
>>>>> root      5454  5446  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>> utilbin/lx24-amd64/rsh -p 32816 lmexec-62 exec '/gridware/sge/   
>>>>> utilbin/lx24-amd64/qrsh_starter' '/v
>>>>> root      5455  5454  0 14:27 ?        00:00:00 [rsh] <defunct>
>>>>>
>>>>> and on the slave node:
>>>>>
>>>>> sgeadmin 14300  3464  0 14:27 ?        00:00:00  
>>>>> sge_shepherd-2375  -bg
>>>>> root     14301 14300  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>> utilbin/lx24-amd64/rshd -l
>>>>> root     14302 14301  0 14:27 ?        00:00:00 /gridware/sge/   
>>>>> utilbin/lx24-amd64/qrsh_starter /var/spool/sge/lmexec-62/   
>>>>> active_jobs/2375.1/1.lmexec-62
>>>>> root     14303 14302 27 14:27 ?        00:00:31 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/ 
>>>>> abinit/  parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>>> root     14304 14303  0 14:27 ?        00:00:00 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/ 
>>>>> abinit/  parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
>>>>> root     14305 14303  0 14:27 ?        00:00:00 /usr/bin/rsh    
>>>>> lmexec-62 -l root -n /home/pan/minet/abinit/parallel_eth/  
>>>>> abinip_eth  lmexec-62 32819 \-p4amslave \-p4yourname lmexec-62
>>>>> root     14306  3995  0 14:27 ?        00:00:00 in.rshd -aL
>>>>> root     14307 14306 86 14:27 ?        00:01:38 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth lmexec-62 32819 - 
>>>>> p4amslave -  p4yourname lmexec-62 -p4rmrank 1
>>>>> root     14357 14307  0 14:27 ?        00:00:00 /home/pan/ 
>>>>> minet/  abinit/parallel_eth/abinip_eth lmexec-62 32819 - 
>>>>> p4amslave -  p4yourname lmexec-62 -p4rmrank 1
>>>>>
>>>>> So, I can see that, in a way, the SGE qrsh/rsh/qrsh_starter  
>>>>> are   coming into play ans that the sge_shepherd is initiating  
>>>>> remote   process.  Nevertheless:
>>>>>
>>>>> - as expected, there is no local instance of the program run  
>>>>> on  the  master node, which is not what we want.
>>>>> - the slave node issues a rsh onto itself, is that expected ?
>>>>>
>>>>> Under these conditions, qstat -ext reports 0 usage (cpu/mem).
>>>>>
>>>>> If I don't use this -nolocal flag, then the rsh/qrsh wrapper    
>>>>> mechanism doesn't seem to come into play, and the master node   
>>>>> does  direct rsh to the slave node. In these conditions, the   
>>>>> qstat -ext  reports cpu time (from a single process, which is   
>>>>> also expected  since there is no SGE control in this case).
>>>>>
>>>>> All in all, I don't see how this -nolocal flag can make the  
>>>>> rsh   wrapper appear to work or fail.
>>>>>
>>>>> 2) as non root user, the first scenario doesn't work as I get  
>>>>> an   "error:rcmd: permission denied".  Second scenario work as  
>>>>> for  root  user.
>>>>>
>>>>> Quite a bit lost...
>>>>>
>>>>> Jean-Paul
>>>>>
>>>>> Reuti wrote:
>>>>>
>>>>>> Hi Jean-Paul,
>>>>>> Am 23.01.2006 um 14:31 schrieb Jean-Paul Minet:
>>>>>>
>>>>>>> Reuti,
>>>>>>>
>>>>>>>> for using qrsh the /etc/hosts.equiv isn't necessary. I set   
>>>>>>>> this   just  to reflect the login node on all exec nodes to   
>>>>>>>> allow   interactive qrsh/ qlogin sessions.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> OK, got this.
>>>>>>>
>>>>>>>> As qrsh will use a chosen port: any firewall and/or etc/  
>>>>>>>> hosts.  (allow| deny) configured? - Reuti
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> No firewall nor hosts.xxx.  The problem was from wrong mode   
>>>>>>> set  on  rsh/rlogin on exec nodes (I had played with those   
>>>>>>> following  some  hints for qrsh problem solving on the SGE  
>>>>>>> FAQ,  which  probably  messed up everything).
>>>>>>>
>>>>>>> MPI jobs can now run with qrsh... CPU time displayed by  
>>>>>>> "qstat  -  ext" is no longer 0... but it corresponds to a  
>>>>>>> single cpu!
>>>>>>>
>>>>>>> 2218 0.24170 0.24169 Test_abini minet         
>>>>>>> NA                  grppcpm    r 0:00:12:09 205.41073 0.00000  
>>>>>>> 74727     0     0   71428   3298 0.04   
>>>>>>> all.q at lmexec-88                    2
>>>>>>>
>>>>>>> This job started about 12 minutes earlier, and runs on 2   
>>>>>>> cpus.    Shouldn't the displayed "cpu" be the sum of all cpu   
>>>>>>> times or is   this the correct behavior?
>>>>>>>
>>>>>>> Thks for your input
>>>>>>>
>>>>>> is "qstat -j 2218" giving you more reasonable results in the    
>>>>>> "usage  1:" line? As "qstat -g t -ext" will also display the   
>>>>>> CPU  time for  slave processes, these should be per process.  
>>>>>> -  Reuti
>>>>>>
>>>>>>> jp
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -- -- -
>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-  
>>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> -- --
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-  
>>>>>> help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Jean-Paul Minet
>>>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage   
>>>>> de  Masse
>>>>> Université Catholique de Louvain
>>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> -- -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>
>>>
>>> -- 
>>> Jean-Paul Minet
>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage  
>>> de  Masse
>>> Université Catholique de Louvain
>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> -- 
> Jean-Paul Minet
> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de  
> Masse
> Université Catholique de Louvain
> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list