[GE users] jobs never die on nodes with mpich

Reuti reuti at staff.uni-marburg.de
Mon Aug 2 21:53:16 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

>UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
>sgeadmin  2434     1  2434  1784  0 Jul19 ?        00:20:28
>/opt/sge/bin/glinux/sge_execd
>sgeadmin  9030  2434  9030  1784  0 15:04 ?        00:00:00  \_
>sge_shepherd-4889 -bg
>root      9031  9030  9031  9031  0 15:04 ?        00:00:00      \_
>/opt/sge/utilbin/glinux/rshd -l
>mitch     9032  9031  9032  9031  0 15:04 ?        00:00:00          \_
>[qrsh_starter <defunct>]

can you provide an output of top, where it is shown, which of this processes 
take up CPU time, and also a process tree of a running job on a slave node?

There is (at least) one possibility to bypass the rsh-wrapper. The default 
$PATH on an execution host is:

$TMPDIR:/usr/local/bin:/usr/ucb:/bin:/usr/bin

This way the symbolic link to the rsh-wrapper in $TMPDIR will be found first. 
But when you modify the $PATH in your shell script, and put anything in front 
of it, you may get a different behavior. Also using -V for qsub may have some 
strange effects, because it alters the $PATH.

To verify this, just put an "echo $PATH" in the script you are using, where you 
are sure that nothing will be prefixed from your script/program to the $PATH at 
a later point.


In the next step, the wrapper will have to locate the real rsh somewhere in the 
$PATH. To achieve this, it will remove the $TMPDIR from the $PATH, and search 
for the real rsh. Is this somewhere in the remaining $PATH, or is it again a 
wrapper/symbolic link to something else?


BTW: I also saw programs and scripts (not using MPICH), which had a hard coded 
"/usr/bin/rsh" inside instead of a plain "rsh", and also had to be modified.


Cheers - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list