[GE users] tight intergration problem

Reuti reuti at staff.uni-marburg.de
Wed Jan 25 20:44:52 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Jean-Paul,

please have a look at the Howto:

http://gridengine.sunsource.net/howto/mpich-integration.html

In short: don't use -nolocal! This will exclude the starting node  
from all starts of a MPICH process and will lead to an uneven process  
distribution. With SGE looking at the number of issued qrsh's, this  
might break the complete setup. And what you see, is that a slave  
node is becoming the new master of the MPICH program. Looks okay, but  
of course wrong for the Tight Integration.

What do you mean by master-node? The master-node of the cluster or  
the "head-node" of the parallel job? With MPICH, also the "head-node"  
of the parallel job (which is actually a conventional exec node in  
the cluster) will do some work (therefore don't use -nolocal). This  
is the one you see in the "qstat" output (or named "MASTER" with  
"qstat -g t").

Can you please post the "ps -e f" output when not using -nolocal? You  
checked also in the default ouput file of the PE (.po), that the  
hostnames listed there, are the ones which you get by the command  
"hostname" on the nodes? Otherwise MPICH might fail to subtract one  
process on the "head-node" of the parallel job.

Using the Tight Integration, also the normal user can just use qrsh.  
So, I'd suggest to submit a small script with the proper -pe request:

#!/bin/sh
cat $TMPDIR/machines
sleep 120

and check on the head-node of the parallel job, that the link to the  
rsh wrapper was created in the intended way in $TMPDIR. Is $SGE_ROOT  
mounted nosuid on the nodes? This might explain that only root can do  
it.

HTH  -Reuti


Am 25.01.2006 um 15:42 schrieb Jean-Paul Minet:

> Reuti,
>
> I am totally lost with this tight intergration...
>
> 1) as root user, if I use the -nolocal flag as mpirun argument, I  
> end up with the following process on the "master node":
>
> root      5349  5326  0 14:27 ?        00:00:00 bash /var/spool/sge/ 
> lmexec-121/job_scripts/2375
> root      5352  5349  0 14:27 ?        00:00:00 /bin/sh /usr/local/ 
> mpich-eth-intel/bin/mpirun -nolocal -np 2 -machinefile /tmp/ 
> 2375.1.all.q/machines abinip_eth
> root      5446  5352  0 14:27 ?        00:00:00 /gridware/sge/bin/ 
> lx24-amd64/qrsh -inherit lmexec-62 /home/pan/minet/abinit/ 
> parallel_eth/abinip_eth -p4pg /home/
> root      5454  5446  0 14:27 ?        00:00:00 /gridware/sge/ 
> utilbin/lx24-amd64/rsh -p 32816 lmexec-62 exec '/gridware/sge/ 
> utilbin/lx24-amd64/qrsh_starter' '/v
> root      5455  5454  0 14:27 ?        00:00:00 [rsh] <defunct>
>
> and on the slave node:
>
> sgeadmin 14300  3464  0 14:27 ?        00:00:00 sge_shepherd-2375 -bg
> root     14301 14300  0 14:27 ?        00:00:00 /gridware/sge/ 
> utilbin/lx24-amd64/rshd -l
> root     14302 14301  0 14:27 ?        00:00:00 /gridware/sge/ 
> utilbin/lx24-amd64/qrsh_starter /var/spool/sge/lmexec-62/ 
> active_jobs/2375.1/1.lmexec-62
> root     14303 14302 27 14:27 ?        00:00:31 /home/pan/minet/ 
> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/ 
> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
> root     14304 14303  0 14:27 ?        00:00:00 /home/pan/minet/ 
> abinit/parallel_eth/abinip_eth -p4pg /home/pan/minet/abinit/ 
> parallel_eth/PI5352 -p4wd /home/pan/minet/abinit/paralle
> root     14305 14303  0 14:27 ?        00:00:00 /usr/bin/rsh  
> lmexec-62 -l root -n /home/pan/minet/abinit/parallel_eth/abinip_eth  
> lmexec-62 32819 \-p4amslave \-p4yourname lmexec-62
> root     14306  3995  0 14:27 ?        00:00:00 in.rshd -aL
> root     14307 14306 86 14:27 ?        00:01:38 /home/pan/minet/ 
> abinit/parallel_eth/abinip_eth lmexec-62 32819 -p4amslave - 
> p4yourname lmexec-62 -p4rmrank 1
> root     14357 14307  0 14:27 ?        00:00:00 /home/pan/minet/ 
> abinit/parallel_eth/abinip_eth lmexec-62 32819 -p4amslave - 
> p4yourname lmexec-62 -p4rmrank 1
>
> So, I can see that, in a way, the SGE qrsh/rsh/qrsh_starter are  
> coming into play ans that the sge_shepherd is initiating remote  
> process.  Nevertheless:
>
> - as expected, there is no local instance of the program run on the  
> master node, which is not what we want.
> - the slave node issues a rsh onto itself, is that expected ?
>
> Under these conditions, qstat -ext reports 0 usage (cpu/mem).
>
> If I don't use this -nolocal flag, then the rsh/qrsh wrapper  
> mechanism doesn't seem to come into play, and the master node does  
> direct rsh to the slave node. In these conditions, the qstat -ext  
> reports cpu time (from a single process, which is also expected  
> since there is no SGE control in this case).
>
> All in all, I don't see how this -nolocal flag can make the rsh  
> wrapper appear to work or fail.
>
> 2) as non root user, the first scenario doesn't work as I get an  
> "error:rcmd: permission denied".  Second scenario work as for root  
> user.
>
> Quite a bit lost...
>
> Jean-Paul
>
> Reuti wrote:
>> Hi Jean-Paul,
>> Am 23.01.2006 um 14:31 schrieb Jean-Paul Minet:
>>> Reuti,
>>>
>>>> for using qrsh the /etc/hosts.equiv isn't necessary. I set this   
>>>> just  to reflect the login node on all exec nodes to allow   
>>>> interactive qrsh/ qlogin sessions.
>>>
>>>
>>> OK, got this.
>>>
>>>> As qrsh will use a chosen port: any firewall and/or etc/hosts.  
>>>> (allow| deny) configured? - Reuti
>>>
>>>
>>> No firewall nor hosts.xxx.  The problem was from wrong mode set  
>>> on  rsh/rlogin on exec nodes (I had played with those following  
>>> some  hints for qrsh problem solving on the SGE FAQ, which  
>>> probably  messed up everything).
>>>
>>> MPI jobs can now run with qrsh... CPU time displayed by "qstat -  
>>> ext" is no longer 0... but it corresponds to a single cpu!
>>>
>>> 2218 0.24170 0.24169 Test_abini minet        NA                 
>>> grppcpm    r 0:00:12:09 205.41073 0.00000 74727     0     0  
>>> 71428   3298 0.04  all.q at lmexec-88                    2
>>>
>>> This job started about 12 minutes earlier, and runs on 2 cpus.    
>>> Shouldn't the displayed "cpu" be the sum of all cpu times or is   
>>> this the correct behavior?
>>>
>>> Thks for your input
>>>
>> is "qstat -j 2218" giving you more reasonable results in the  
>> "usage  1:" line? As "qstat -g t -ext" will also display the CPU  
>> time for  slave processes, these should be per process. - Reuti
>>> jp
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> -- 
> Jean-Paul Minet
> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de  
> Masse
> Université Catholique de Louvain
> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list