[GE users] mpich2/mpd TI - slave processes not properly accounted for

cwchan c-chan at uchicago.edu
Wed Aug 19 01:44:52 BST 2009


Also Sprach reuti:

> Hi,
>
> Am 19.08.2009 um 00:32 schrieb cwchan:
>
>> Hello,
>>
>> We have a small cluster with 20 nodes and 256 x86_64 CPU cores, using
>> SGE 6.1u2 as the DRMS with ssh tight integration instead of rsh.
>
> did you recompile SGE with -tight-ssh? Otherwise it would exactly
> explain your observations, as the supplied rsh will add an additional
> group ID which is used to track the consumption. The same will be
> done by the special compiled ssh. You don't have a private network
> for your cluster and must use ssh?
>
> -- Reuti

Yes, the sshd binary was compiled with tight integration and is
in

/usr/share/sge/6.1/bin/lx26-amd64/sshd

When invoked, that sshd outputs

set_admin_username() with zero length username: No such file or directory

The cluster is on a private network with a login node which has
a separate interface accessible from the public network.  We use
sshd within the cluster for its features, such as X11 forwarding,
and also because of HIPAA concerns.

Interactive jobs run via qrsh do have their resource usage properly
accounted for, and the mpich2 jobs are running as children of the
qrsh process, e.g.

|-sge_shepherd---sshd---sshd---qrsh_starter---tcsh---python2.4---2*[python2.4---mpilscg]

where mpilscg is the name of the user's mpich2 program.


-- 
C. Chan <c-chan at uchicago.edu>
GPG Public Key registered at pgp.mit.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=212952

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list