[GE users] mpich2/mpd TI - slave processes not properly accounted for
reuti at staff.uni-marburg.de
Wed Aug 19 08:46:08 BST 2009
Am 19.08.2009 um 02:44 schrieb cwchan:
> Also Sprach reuti:
>> Am 19.08.2009 um 00:32 schrieb cwchan:
>>> We have a small cluster with 20 nodes and 256 x86_64 CPU cores,
>>> SGE 6.1u2 as the DRMS with ssh tight integration instead of rsh.
>> did you recompile SGE with -tight-ssh? Otherwise it would exactly
>> explain your observations, as the supplied rsh will add an additional
>> group ID which is used to track the consumption. The same will be
>> done by the special compiled ssh. You don't have a private network
>> for your cluster and must use ssh?
>> -- Reuti
> Yes, the sshd binary was compiled with tight integration and is
When you create an interactive session, do you see an additonal group
$ qrsh id
uid=1001(reuti) gid=25000(orgqui) groups=1000(operator),20040,25000
(here the 20040)
> When invoked, that sshd outputs
MPICH2 uses rsh by default, which is caught by SGE's rsh-wrapper and
then SGE uses ssh in the end? The name is at this point only a name.
Someone could compile MPICH2 to call "fubar" as rsh client, and
adjust startmpich2.sh to create a link "fubar" in $TMPDIR, and SGE
could use ssh in the end.
> set_admin_username() with zero length username: No such file or
> The cluster is on a private network with a login node which has
> a separate interface accessible from the public network. We use
> sshd within the cluster for its features, such as X11 forwarding,
> and also because of HIPAA concerns.
Ok, I see.
> Interactive jobs run via qrsh do have their resource usage properly
> accounted for, and the mpich2 jobs are running as children of the
> qrsh process, e.g.
Is the path of the used sshd output when you try - is it the correct
$ ps -e f
(f w/o -)
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users