[GE users] mpich2/mpd TI - slave processes not properly accounted for

cwchan c-chan at uchicago.edu
Wed Aug 19 17:28:32 BST 2009

Also Sprach reuti:

> When you create an interactive session, do you see an additonal group
> id:
> $ qrsh id
> uid=1001(reuti) gid=25000(orgqui) groups=1000(operator),20040,25000
> (orgqui)
> (here the 20040)

Yes, there is an extra GID.  If I do two "qrsh id" in quick
succession, the second GID is incremented by 1 from the first.

> MPICH2 uses rsh by default, which is caught by SGE's rsh-wrapper and
> then SGE uses ssh in the end? The name is at this point only a name.
> Someone could compile MPICH2 to call "fubar" as rsh client, and
> adjust startmpich2.sh to create a link "fubar" in $TMPDIR, and SGE
> could use ssh in the end.

Yes, the rsh wrapper script is configured to call qrsh with
the just_wrap option off.

>> Interactive jobs run via qrsh do have their resource usage properly
>> accounted for, and the mpich2 jobs are running as children of the
>> qrsh process, e.g.
>> |-sge_shepherd---sshd---sshd---qrsh_starter---tcsh---python2.4---2*
>> [python2.4---mpilscg]
> Is the path of the used sshd output when you try - is it the correct
> one:
> $ ps -e f
> (f w/o -)
> -- Reuti

Yes, it is the SGE sshd in /usr/share/sge/6.1/bin/lx26-amd64.

The compute nodes do not allow direct ssh logins by non-root users;
/usr/sbin/sshd reads the /etc/ssh/sshd_config file which has 
"AllowUsers root" set.  The cluster config has the rsh/rlogin
command set to /usr/bin/ssh and the rshd/rlogind command set to

/usr/share/sge/6.1/bin/lx26-amd64/sshd -i -f /usr/share/sge/etc/sshd_config

which allows access by all users.  This is meant to force accounting of all
usage on the cluster, including interactive logins.

C. Chan <c-chan at uchicago.edu>
GPG Public Key registered at pgp.mit.edu


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list