[GE users] mpich2/mpd TI - slave processes not properly accounted for

reuti reuti at staff.uni-marburg.de
Wed Aug 19 00:40:29 BST 2009


Am 19.08.2009 um 00:32 schrieb cwchan:

> Hello,
> We have a small cluster with 20 nodes and 256 x86_64 CPU cores, using
> SGE 6.1u2 as the DRMS with ssh tight integration instead of rsh.

did you recompile SGE with -tight-ssh? Otherwise it would exactly  
explain your observations, as the supplied rsh will add an additional  
group ID which is used to track the consumption. The same will be  
done by the special compiled ssh. You don't have a private network  
for your cluster and must use ssh?

-- Reuti

> We set up the mpich2/mpd tight integration according to Reuti's
> HOWTO, and it seems to work properly - a master process is started
> on a compute node, slave processes are started on other compute nodes,
> qdel works, and processes terminate properly without leaving orphans
> when the job completes normally.
> We also use the ARCO accounting system and the Java Webconsole to
> query the accounting data which is in a Postgresql database.  When
> looking at the user accounting data, the CPU and memory usage seem
> to take into account the resources used by the mpich2 master process
> but not for any of the slaves.  Accurate accounting is required by
> our funding agency so this presents something of a problem.
> Jobs which are started interactively via qrsh/qlogin are tracked
> properly by the accounting system, so I'm not certain where the
> problem might be, and would be grateful for some hints as to
> where to look.
> -- 
> C. Chan <c-chan at uchicago.edu>
> GPG Public Key registered at pgp.mit.edu
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=212928
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list