[GE users] mpich2/mpd TI - slave processes not properly accounted for

cwchan c-chan at uchicago.edu
Tue Aug 18 23:32:34 BST 2009


We have a small cluster with 20 nodes and 256 x86_64 CPU cores, using
SGE 6.1u2 as the DRMS with ssh tight integration instead of rsh.

We set up the mpich2/mpd tight integration according to Reuti's
HOWTO, and it seems to work properly - a master process is started
on a compute node, slave processes are started on other compute nodes,
qdel works, and processes terminate properly without leaving orphans
when the job completes normally.

We also use the ARCO accounting system and the Java Webconsole to
query the accounting data which is in a Postgresql database.  When
looking at the user accounting data, the CPU and memory usage seem
to take into account the resources used by the mpich2 master process
but not for any of the slaves.  Accurate accounting is required by
our funding agency so this presents something of a problem.

Jobs which are started interactively via qrsh/qlogin are tracked
properly by the accounting system, so I'm not certain where the
problem might be, and would be grateful for some hints as to
where to look.

C. Chan <c-chan at uchicago.edu>
GPG Public Key registered at pgp.mit.edu


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list