[GE users] mpich2/mpd TI - slave processes not properly accounted for

cwchan c-chan at uchicago.edu
Wed Aug 19 21:07:12 BST 2009


Also Sprach reuti:

> All looks fine. The job shut down all on processes and daemons in a
> nice way, end the accounting records were written? How many entries
> do you have in qacct for such a job? It should be one for the
> jobscript (with near zero consumption), and one for each started
> daemon per node.
>
> -- Reuti

qacct does seem to record the usage properly.  We have a queue named
distmem.q for distributed memory jobs, which requires an MPI or PVM
PE to be specified and allows interactive logins so qrsh can start
the mpd slave processes:

# qacct -q distmem.q

HOST      CLUSTER QUEUE     WALLCLOCK    UTIME         STIME     CPU          MEMORY      IO        IOW
=========================================================================================================
node1     distmem.q         3824499      20148830      8744      33760467     1936322.664 0.000     0.000
node2     distmem.q         4612359      20919086      6998      39907483     2629250.162 0.000     0.000
node3     distmem.q         3047254      5801404       2726      25828364     2179188.361 0.000     0.000
node4     distmem.q         4968677      13413948      6012      49133903     3770743.466 0.000     0.000

However the ARCO report shows almost 0 CPU and MEM usage for the user running this job.
It appears that SGE itself is properly tracking usage but somewhere in the sgedbwriter
parsing of the accounting file and writing to the Postgresql ARCO database the information
is lost or mangled.

-- 
C. Chan <c-chan at uchicago.edu>
GPG Public Key registered at pgp.mit.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213122

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list