[GE users] mpich2/mpd TI - slave processes not properly accounted for
reuti at staff.uni-marburg.de
Wed Aug 19 19:17:13 BST 2009
Am 19.08.2009 um 18:28 schrieb cwchan:
> Also Sprach reuti:
>> When you create an interactive session, do you see an additonal group
>> $ qrsh id
>> uid=1001(reuti) gid=25000(orgqui) groups=1000(operator),20040,25000
>> (here the 20040)
> Yes, there is an extra GID. If I do two "qrsh id" in quick
> succession, the second GID is incremented by 1 from the first.
>> MPICH2 uses rsh by default, which is caught by SGE's rsh-wrapper and
>> then SGE uses ssh in the end? The name is at this point only a name.
>> Someone could compile MPICH2 to call "fubar" as rsh client, and
>> adjust startmpich2.sh to create a link "fubar" in $TMPDIR, and SGE
>> could use ssh in the end.
> Yes, the rsh wrapper script is configured to call qrsh with
> the just_wrap option off.
>>> Interactive jobs run via qrsh do have their resource usage properly
>>> accounted for, and the mpich2 jobs are running as children of the
>>> qrsh process, e.g.
>> Is the path of the used sshd output when you try - is it the correct
>> $ ps -e f
>> (f w/o -)
>> -- Reuti
> Yes, it is the SGE sshd in /usr/share/sge/6.1/bin/lx26-amd64.
> The compute nodes do not allow direct ssh logins by non-root users;
> /usr/sbin/sshd reads the /etc/ssh/sshd_config file which has
> "AllowUsers root" set. The cluster config has the rsh/rlogin
> command set to /usr/bin/ssh and the rshd/rlogind command set to
> /usr/share/sge/6.1/bin/lx26-amd64/sshd -i -f /usr/share/sge/etc/
> which allows access by all users. This is meant to force
> accounting of all
> usage on the cluster, including interactive logins.
All looks fine. The job shut down all on processes and daemons in a
nice way, end the accounting records were written? How many entries
do you have in qacct for such a job? It should be one for the
jobscript (with near zero consumption), and one for each started
daemon per node.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users