[GE users] Qdel problem

Reuti reuti at staff.uni-marburg.de
Tue Oct 3 20:37:51 BST 2006


Am 03.10.2006 um 21:25 schrieb Liang Ge:

> On 10/3/06, Reuti <reuti at staff.uni-marburg.de> wrote:
>> <snip>
>>
>> Can you check a running program with "ps -e f" to have a look at the
>> process tree - are all bound to sge_shepherd on the slaves?
>
> I think the answer is yes. Here is the output of "ps -e f"
> 1565 ?        S      0:33 /opt/sge/bin/lx24-amd64/sge_execd
> 5575 ?        S      0:00  \_ sge_shepherd-833 -bg
> 5584 ?        Ss     0:00      \_ bash
> /opt/sge/default/spool/node0046/job_scripts/833
> 5585 ?        S      0:00          \_ perl -S -w
> /opt/mpich-mx.gcc/bin/mpirun.ch_mx.pl --mx-kill 5 -np 8 -machinefile
> /opt/sge/tmp/833.1.all.q/machines /home
> 5614 ?        S      0:00              \_ perl -S -w
> /opt/mpich-mx.gcc/bin/mpirun.ch_mx.pl --mx-kill 5 -np 8 -machinefile
> /opt/sge/tmp/833.1.all.q/machines /
> 5615 ?        S      0:24              \_ rsh node0008 cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_DIS
> 5667 ?        Z      0:00              |   \_ [rsh] <defunct>
> 5616 ?        S      0:00              \_ rsh node0008 -n cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_
> 5617 ?        S      0:00              \_ rsh node0008 -n cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_
> 5618 ?        S      0:00              \_ rsh node0008 -n cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_
> 5619 ?        S      0:00              \_ rsh node0046 -n cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_
> 5620 ?        S      0:00              \_ rsh node0046 -n cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_
> 5621 ?        S      0:00              \_ rsh node0046 -n cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_
> 5622 ?        S      0:00              \_ rsh node0046 -n cd
> /home/lg65/JCP/DrivenCavity/Re1000/2D_33 && exec env
> MXMPI_MASTER=node0046 MXMPI_PORT=50349 MX_
> 1587 ?        Ss     0:00 /usr/sbin/sshd
> 1602 ?        Ss     0:00 xinetd -stayalive -pidfile /var/run/ 
> xinetd.pid
> 5623 ?        Ss     0:00  \_ in.rshd
> 5627 ?        Rl    33:49  |   \_

this shouldn't be. Somehow the rsh wrapper isn't used. Can you please  
do two things:

1. Put an:

echo $PATH

before the mpirun command to check the $PATH.

2. Althought the SGE created $PATH should be okay, you can try to put  
the $TMPDIR witht the link to the wrapper in front of it (maybe you  
define the $PATH somewhere on your own, which supersedes the SGE set  
one):

export PATH=$TMPDIR:$PATH


Also the started slaves must be children of a sge_execde/sge_shepered.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list