[GE users] PE Accounting

templedf dan.templeton at sun.com
Fri Nov 13 15:24:00 GMT 2009


Good tip on the accounting summary.  I had forgotten about that field.  
I set it to FALSE and tried again.  This time I got an accounting record 
for each slave, and the sum of the cpu fields matched the CPU time 
reported by qstat -j.  I'll test this issue on u4 and file a bug report 
if I find it there, too.

Daniel

templedf wrote:
> reuti wrote:
>   
>> Am 13.11.2009 um 15:48 schrieb templedf:
>>
>>   
>>     
>>> Can someone point out what I'm missing here?  I have a tightly
>>> integrated parallel job.  It runs for about 11 minutes.  Just  
>>> before it
>>> ends, I run a qstat -j, and I get the following usage line:
>>>
>>> usage    1:               cpu=00:40:53, mem=652.32980 GBs, io=0.00000,
>>> vmem=23.023M, maxvmem=50.125M
>>>
>>> As far as I can tell, it's working as intended.  I have 40 CPU-minutes
>>> in 11 minutes of execution, which corresponds to the 4 slots on which
>>> the job is running.  BUT, after the job ends, I run qacct -j, and it
>>> shows me:
>>>
>>> cpu          9.040
>>> mem          0.369
>>> io           0.000
>>> iow          0.000
>>> maxvmem      50.125M
>>>
>>> It got the maxvmem right, but the CPU time is clearly only for the
>>> master task.  Even better, a little further up in the qacct output  
>>> we see:
>>>
>>> ru_wallclock 726
>>> ru_utime     1356.589
>>> ru_stime     104.770
>>>
>>> which says that the job consumed twice as much CPU time as wallclock
>>> time.  Huh?
>>>     
>>>       
>> Is it a job with 2 threads on the master node of the job? This is  
>> what the kernel sees on it's own, while "cpu" is accounted by the  
>> additional GID.
>>   
>>     
>
> There's a slave task on the master node, and it's trying to use two 
> threads, yes.  There's only one CPU, though.
>
>   
>>   
>>     
>>> Why do I have three different CPU time values that don't agree with  
>>> each
>>> other?  Am I just misunderstanding the numbers?
>>>     
>>>       
>> Which version of SGE and which startup-method (builtin or traditional  
>> rsh).
>>   
>>     
>
> It's 6.2u5alpha2 with builtin interactive support.  It's actually a 
> completely default installation with my PE added in.
>
>   
>> How many records do you have in the qaact output for the job - one or  
>> more?
>>   
>>     
>
> For each of the jobs in both the emails I wrote there is only one entry 
> in accounting:
>
>   
>> What is the setting of "accounting_summary" in the PE?
>>   
>>     
>
> In both PEs, the accounting_summary is TRUE.  And control_slave is TRUE, 
> and job_is_first_task is FALSE.
>
> Thanks,
> Daniel
>
>   
>> -- Reuti
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226688
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226692
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226694

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list