[GE users] PE Accounting

templedf dan.templeton at sun.com
Fri Nov 13 15:14:19 GMT 2009

reuti wrote:
> Am 13.11.2009 um 15:48 schrieb templedf:
>> Can someone point out what I'm missing here?  I have a tightly
>> integrated parallel job.  It runs for about 11 minutes.  Just  
>> before it
>> ends, I run a qstat -j, and I get the following usage line:
>> usage    1:               cpu=00:40:53, mem=652.32980 GBs, io=0.00000,
>> vmem=23.023M, maxvmem=50.125M
>> As far as I can tell, it's working as intended.  I have 40 CPU-minutes
>> in 11 minutes of execution, which corresponds to the 4 slots on which
>> the job is running.  BUT, after the job ends, I run qacct -j, and it
>> shows me:
>> cpu          9.040
>> mem          0.369
>> io           0.000
>> iow          0.000
>> maxvmem      50.125M
>> It got the maxvmem right, but the CPU time is clearly only for the
>> master task.  Even better, a little further up in the qacct output  
>> we see:
>> ru_wallclock 726
>> ru_utime     1356.589
>> ru_stime     104.770
>> which says that the job consumed twice as much CPU time as wallclock
>> time.  Huh?
> Is it a job with 2 threads on the master node of the job? This is  
> what the kernel sees on it's own, while "cpu" is accounted by the  
> additional GID.

There's a slave task on the master node, and it's trying to use two 
threads, yes.  There's only one CPU, though.

>> Why do I have three different CPU time values that don't agree with  
>> each
>> other?  Am I just misunderstanding the numbers?
> Which version of SGE and which startup-method (builtin or traditional  
> rsh).

It's 6.2u5alpha2 with builtin interactive support.  It's actually a 
completely default installation with my PE added in.

> How many records do you have in the qaact output for the job - one or  
> more?

For each of the jobs in both the emails I wrote there is only one entry 
in accounting:

> What is the setting of "accounting_summary" in the PE?

In both PEs, the accounting_summary is TRUE.  And control_slave is TRUE, 
and job_is_first_task is FALSE.


> -- Reuti
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226688
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list