[GE users] PE Accounting

reuti reuti at staff.uni-marburg.de
Fri Nov 13 15:43:04 GMT 2009


Am 13.11.2009 um 16:24 schrieb templedf:

> Good tip on the accounting summary.  I had forgotten about that field.
> I set it to FALSE and tried again.  This time I got an accounting  
> record
> for each slave, and the sum of the cpu fields matched the CPU time
> reported by qstat -j.  I'll test this issue on u4 and file a bug  
> report
> if I find it there, too.

There were already these:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2756

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2787

Maybe it's related, also they ought to be fixed.

-- Reuti


>
> Daniel
>
> templedf wrote:
>> reuti wrote:
>>
>>> Am 13.11.2009 um 15:48 schrieb templedf:
>>>
>>>
>>>
>>>> Can someone point out what I'm missing here?  I have a tightly
>>>> integrated parallel job.  It runs for about 11 minutes.  Just
>>>> before it
>>>> ends, I run a qstat -j, and I get the following usage line:
>>>>
>>>> usage    1:               cpu=00:40:53, mem=652.32980 GBs,  
>>>> io=0.00000,
>>>> vmem=23.023M, maxvmem=50.125M
>>>>
>>>> As far as I can tell, it's working as intended.  I have 40 CPU- 
>>>> minutes
>>>> in 11 minutes of execution, which corresponds to the 4 slots on  
>>>> which
>>>> the job is running.  BUT, after the job ends, I run qacct -j,  
>>>> and it
>>>> shows me:
>>>>
>>>> cpu          9.040
>>>> mem          0.369
>>>> io           0.000
>>>> iow          0.000
>>>> maxvmem      50.125M
>>>>
>>>> It got the maxvmem right, but the CPU time is clearly only for the
>>>> master task.  Even better, a little further up in the qacct output
>>>> we see:
>>>>
>>>> ru_wallclock 726
>>>> ru_utime     1356.589
>>>> ru_stime     104.770
>>>>
>>>> which says that the job consumed twice as much CPU time as  
>>>> wallclock
>>>> time.  Huh?
>>>>
>>>>
>>> Is it a job with 2 threads on the master node of the job? This is
>>> what the kernel sees on it's own, while "cpu" is accounted by the
>>> additional GID.
>>>
>>>
>>
>> There's a slave task on the master node, and it's trying to use two
>> threads, yes.  There's only one CPU, though.
>>
>>
>>>
>>>
>>>> Why do I have three different CPU time values that don't agree with
>>>> each
>>>> other?  Am I just misunderstanding the numbers?
>>>>
>>>>
>>> Which version of SGE and which startup-method (builtin or  
>>> traditional
>>> rsh).
>>>
>>>
>>
>> It's 6.2u5alpha2 with builtin interactive support.  It's actually a
>> completely default installation with my PE added in.
>>
>>
>>> How many records do you have in the qaact output for the job -  
>>> one or
>>> more?
>>>
>>>
>>
>> For each of the jobs in both the emails I wrote there is only one  
>> entry
>> in accounting:
>>
>>
>>> What is the setting of "accounting_summary" in the PE?
>>>
>>>
>>
>> In both PEs, the accounting_summary is TRUE.  And control_slave is  
>> TRUE,
>> and job_is_first_task is FALSE.
>>
>> Thanks,
>> Daniel
>>
>>
>>> -- Reuti
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=226688
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=226692
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=226694
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226703

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list