[GE users] PE Accounting

templedf dan.templeton at sun.com
Fri Nov 13 15:56:15 GMT 2009


The new issue is:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=3179

Daniel

reuti wrote:
> Am 13.11.2009 um 16:24 schrieb templedf:
>
>   
>> Good tip on the accounting summary.  I had forgotten about that field.
>> I set it to FALSE and tried again.  This time I got an accounting  
>> record
>> for each slave, and the sum of the cpu fields matched the CPU time
>> reported by qstat -j.  I'll test this issue on u4 and file a bug  
>> report
>> if I find it there, too.
>>     
>
> There were already these:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2756
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2787
>
> Maybe it's related, also they ought to be fixed.
>
> -- Reuti
>
>
>   
>> Daniel
>>
>> templedf wrote:
>>     
>>> reuti wrote:
>>>
>>>       
>>>> Am 13.11.2009 um 15:48 schrieb templedf:
>>>>
>>>>
>>>>
>>>>         
>>>>> Can someone point out what I'm missing here?  I have a tightly
>>>>> integrated parallel job.  It runs for about 11 minutes.  Just
>>>>> before it
>>>>> ends, I run a qstat -j, and I get the following usage line:
>>>>>
>>>>> usage    1:               cpu=00:40:53, mem=652.32980 GBs,  
>>>>> io=0.00000,
>>>>> vmem=23.023M, maxvmem=50.125M
>>>>>
>>>>> As far as I can tell, it's working as intended.  I have 40 CPU- 
>>>>> minutes
>>>>> in 11 minutes of execution, which corresponds to the 4 slots on  
>>>>> which
>>>>> the job is running.  BUT, after the job ends, I run qacct -j,  
>>>>> and it
>>>>> shows me:
>>>>>
>>>>> cpu          9.040
>>>>> mem          0.369
>>>>> io           0.000
>>>>> iow          0.000
>>>>> maxvmem      50.125M
>>>>>
>>>>> It got the maxvmem right, but the CPU time is clearly only for the
>>>>> master task.  Even better, a little further up in the qacct output
>>>>> we see:
>>>>>
>>>>> ru_wallclock 726
>>>>> ru_utime     1356.589
>>>>> ru_stime     104.770
>>>>>
>>>>> which says that the job consumed twice as much CPU time as  
>>>>> wallclock
>>>>> time.  Huh?
>>>>>
>>>>>
>>>>>           
>>>> Is it a job with 2 threads on the master node of the job? This is
>>>> what the kernel sees on it's own, while "cpu" is accounted by the
>>>> additional GID.
>>>>
>>>>
>>>>         
>>> There's a slave task on the master node, and it's trying to use two
>>> threads, yes.  There's only one CPU, though.
>>>
>>>
>>>       
>>>>         
>>>>> Why do I have three different CPU time values that don't agree with
>>>>> each
>>>>> other?  Am I just misunderstanding the numbers?
>>>>>
>>>>>
>>>>>           
>>>> Which version of SGE and which startup-method (builtin or  
>>>> traditional
>>>> rsh).
>>>>
>>>>
>>>>         
>>> It's 6.2u5alpha2 with builtin interactive support.  It's actually a
>>> completely default installation with my PE added in.
>>>
>>>
>>>       
>>>> How many records do you have in the qaact output for the job -  
>>>> one or
>>>> more?
>>>>
>>>>
>>>>         
>>> For each of the jobs in both the emails I wrote there is only one  
>>> entry
>>> in accounting:
>>>
>>>
>>>       
>>>> What is the setting of "accounting_summary" in the PE?
>>>>
>>>>
>>>>         
>>> In both PEs, the accounting_summary is TRUE.  And control_slave is  
>>> TRUE,
>>> and job_is_first_task is FALSE.
>>>
>>> Thanks,
>>> Daniel
>>>
>>>
>>>       
>>>> -- Reuti
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=226688
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users- 
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>>
>>>>         
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=226692
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>       
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=226694
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226703
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226710

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list