[GE users] jobs never die on nodes with mpich

Jesse Becker jbecker at northwestern.edu
Tue Aug 3 17:36:58 BST 2004


On Mon, Aug 02, 2004 at 10:53:16PM +0200, Reuti wrote:
> Mailing-List: contact users-help at gridengine.sunsource.net; run by ezmlm
> X-No-Archive: yes
> Reply-To: users at gridengine.sunsource.net
> Date: Mon,  2 Aug 2004 22:53:16 +0200
> From: Reuti <reuti at staff.uni-marburg.de>
> To: users at gridengine.sunsource.net
> X-Originating-IP: 217.84.182.56
> Subject: Re: [GE users] jobs never die on nodes with mpich
> 
> Hi,
> 
> >UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
> >sgeadmin  2434     1  2434  1784  0 Jul19 ?        00:20:28
> >/opt/sge/bin/glinux/sge_execd
> >sgeadmin  9030  2434  9030  1784  0 15:04 ?        00:00:00  \_
> >sge_shepherd-4889 -bg
> >root      9031  9030  9031  9031  0 15:04 ?        00:00:00      \_
> >/opt/sge/utilbin/glinux/rshd -l
> >mitch     9032  9031  9032  9031  0 15:04 ?        00:00:00          \_
> >[qrsh_starter <defunct>]
> 
> can you provide an output of top, where it is shown, which of this processes 
> take up CPU time, and also a process tree of a running job on a slave node?
> 
> There is (at least) one possibility to bypass the rsh-wrapper. The default 
> $PATH on an execution host is:

Just to throw a little more information into the ring:  I have the same
problem, but instead of using RSH, I use SSH (via passwordless public-key
authentication within the nodes) instead.  So, I'm not sure it's related
specifically to rsh.

-- 
Jesse Becker
GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0  2720 0083 0931 9A2B 06A2


    [ Part 2, Application/PGP-SIGNATURE 196 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list