[GE users] mpich2/mpd TI - slave processes not properly accounted for

reuti reuti at staff.uni-marburg.de
Wed Aug 19 21:30:16 BST 2009


Am 19.08.2009 um 22:07 schrieb cwchan:

> Also Sprach reuti:
>
>> All looks fine. The job shut down all on processes and daemons in a
>> nice way, end the accounting records were written? How many entries
>> do you have in qacct for such a job? It should be one for the
>> jobscript (with near zero consumption), and one for each started
>> daemon per node.
>>
>> -- Reuti
>
> qacct does seem to record the usage properly.  We have a queue named
> distmem.q for distributed memory jobs, which requires an MPI or PVM
> PE to be specified and allows interactive logins so qrsh can start
> the mpd slave processes:
>
> # qacct -q distmem.q
>
> HOST      CLUSTER QUEUE     WALLCLOCK    UTIME         STIME      
> CPU          MEMORY      IO        IOW
> ====================================================================== 
> ===================================
> node1     distmem.q         3824499      20148830      8744       
> 33760467     1936322.664 0.000     0.000
> node2     distmem.q         4612359      20919086      6998       
> 39907483     2629250.162 0.000     0.000
> node3     distmem.q         3047254      5801404       2726       
> 25828364     2179188.361 0.000     0.000
> node4     distmem.q         4968677      13413948      6012       
> 49133903     3770743.466 0.000     0.000
>
> However the ARCO report shows almost 0 CPU and MEM usage for the  
> user running this job.
> It appears that SGE itself is properly tracking usage but somewhere  
> in the sgedbwriter
> parsing of the accounting file and writing to the Postgresql ARCO  
> database the information
> is lost or mangled.

Hmm - I never used ARCO on my own up to now. But shouldn't SGE's  
accounting file be cleared when these records were written to the  
database? Hence there shouldn't be any entries for qacct be available  
at all to work.

-- Reuti

> -- 
> C. Chan <c-chan at uchicago.edu>
> GPG Public Key registered at pgp.mit.edu
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=213122
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213127

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list