[GE users] Arco tool results differ from qacct

John Mc-Nicholas XJ (GU/ETL) john.xj.mc-nicholas at ericsson.com
Mon May 21 10:21:21 BST 2007


Hi Jana/Daniel

In this case I use database :sge_job_usage, but I have also used the
accounting database.
qacct groups jobs according to the jobs start time? I've done the same
for the SQL query.
So this SQL SHOULD TOTAL UP THE MEMORY GBS for all the jobs started
within each month.


SQL:
SELECT date_trunc('month', ju_start_time) AS month,
SUM (ju_mem) AS "mem"  
FROM sge_job_usage 
WHERE ju_start_time  >  (current_timestamp - interval '1 year') 
GROUP BY month
ORDER BY month; 

resulting table
month   			mem  
 
2007-02-01 00:00:00.0 532138.750717 
2007-03-01 00:00:00.0 5274933.144317 
2007-04-01 00:00:00.0 6884688.555405 
2007-05-01 00:00:00.0 2789895.540273 

Here are the results from qacct command. Compare the MEMORY column to
table above.
The results differ by a significant amount. A query on ju_cpu results in
a similar discrepency.
qacct 
johnick at seasub1[~]# qacct -b 200702010000 -e 200702312359
Total System Usage
    WALLCLOCK         UTIME         STIME           CPU
MEMORY                 IO                IOW
========================================================================
========================================
      2433584        289462        131581        854446
567582.583              0.000              0.000
johnick at seasub1[~]# qacct -b 200703010000 -e 200703312359
Total System Usage
    WALLCLOCK         UTIME         STIME           CPU
MEMORY                 IO                IOW
========================================================================
========================================
      4753132       1041297         53389       2957120
3923641.991              0.000              0.000
johnick at seasub1[~]# qacct -b 200704010000 -e 200704312359
Total System Usage
    WALLCLOCK         UTIME         STIME           CPU
MEMORY                 IO                IOW
========================================================================
========================================
      6118415       2063020        140069       4094226
5743492.079              0.000              0.000
johnick at seasub1[~]# qacct -b 200705010000 -e 200705312359
Total System Usage
    WALLCLOCK         UTIME         STIME           CPU
MEMORY                 IO                IOW
========================================================================
========================================
      2746486        983188        156462       1761848
2388992.294              0.000              0.000



-----Original Message-----
From: Jana.Olivova at Sun.COM [mailto:Jana.Olivova at Sun.COM] 
Sent: 18 May 2007 18:58
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Arco tool results differ from qacct

I have problem replicating the issue, though. I keep running jobs (using
Maintrunk GE) and the numbers keep matching.

Jana

Daniel Templeton wrote:
> It may be worth noting that qacct and ARCo use different source data 
> files.  qacct uses the accounting file, and ARCo uses the reporting 
> file.  It is not inconceivable that there could be an issue such that 
> the qmaster might write different data to the two files in some cases.

> Just a thought.
>
> Daniel
>
> Jana Olivova wrote:
>> Hi John,
>>
>> I could check on the Arco side. I have checked my data and they are 
>> both the same, except the rounding that appears in qacct. I do have, 
>> however, very small sample of data. Frankly, I am not sure what would

>> cause this. Arco only inserts the data that is given to it by the 
>> qmaster, in the reporting file.
>>
>> Can you tell me what sql query did you use to obtain the data in ARCo

>> and what database are you using?
>>
>> Jana Olivova
>>
>> John Mc-Nicholas XJ (GU/ETL) wrote:
>>>
>>> Hi All
>>>
>>> I am basically having the same problem that Todd Heywood had earlier

>>> in the year.
>>> He gave up on Arco tool in the end , I hope I haven't got to do the 
>>> same.
>>>
>>> >/ Heywood, Todd wrote:/ >/> How does ACRo report time and memory? I
>>> assumed it would be the same as/ >/> for qacct, for which it is 
>>> seconds and Gbytes (according to "man/ >/> accounting"). But qacct 
>>> and ACRo are reporting different numbers. Unit/ >/> conversions 
>>> don't account for the diffs/
>>>
>>> The Arco Tool produces nice graphs and the SQL works fine but when I

>>> compare to the output of QACCT , it is a completely different set of

>>> results.
>>>
>>> There is some correlation between the data. For example, Aprils 
>>> usage is the highest in both sets of results & The users with the 
>>> most usage also correspond in both sets of data.
>>> But the actual data seems to be randomly out by an order of 20-30%.
>>>
>>> I'm specifically trying to extract grid jobs memory (Gigabyte
>>> seconds) per month
>>> For example the data for April
>>> qacct -b 200704010000 -e 200704312359 MEMORY 5743492.079
>>>
>>> But the output in arco gives.........
>>> 6324866.240448
>>>
>>> Is this a bug in ARCO/GRID ?
>>> What would cause this behaviour?
>>>
>>> The only strange thing I've noticed is that I have 2 dbwriter 
>>> process instead of 1 & 5 postmaster instead of 3.
>>>
>>>
>>> sgeadm 1430 1422 0 May 10 ? 0:00 /bin/sh 
>>> /grid/dbwriter/util/dbwriter.sh sgeadm 1422 1 0 May 10 ? 0:00 
>>> /bin/sh /grid/dbwriter/util/dbwriter.sh postgres 1402 1401 0 May 10 
>>> ? 0:00 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database 
>>> -S postgres 1403 1402 0 May 10 ? 0:01 
>>> /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database -S 
>>> postgres 1401 1 0 May 10 ? 0:04 /usr/local/pgsql/bin/postmaster -D 
>>> /usr/local/pgsql/database -S postgres 13303 1401 0 16:29:34 ? 0:00 
>>> /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database -S 
>>> postgres 9719 1401 0 14:31:33 ? 0:20 /usr/local/pgsql/bin/postmaster

>>> -D /usr/local/pgsql/database -S
>>>
>>> If you've any ideas please get back to me & I'll give you more 
>>> detailed info.
>>>
>>> Best Regards
>>>
>>> John
>>> */ John Mc Nicholas /*
>>>
>>> * STE/SEA Support Engineer *
>>> * BETE Test Plants UK *
>>> E
>>>
>>> Phone: +44 (0) 1483 305458
>>> Email: john.xj.mc-nicholas at ericsson.com
>>> Address: Ericsson, Midleton Gate, Guildford Business Park, 
>>> Guildford, Surrey, GU2 8SG , UK
>>>
>>> / Ericsson Limited /
>>> / Registered Office: Unit 4, Midleton Gate, Guildford Business Park,

>>> Guildford, Surrey, GU2 8SG / / Registered Number in England and 
>>> Wales: 942215 / / This communication is confidential and intended 
>>> solely for the addressee(s). Any unauthorised review, use, 
>>> disclosure or distribution is prohibited. If you believe this 
>>> message has been sent to you in error, please notify the sender by 
>>> replying to this transmission and delete the message without 
>>> disclosing it. Thank you.
>>> Ericsson Limited does not enter into contracts or contractual 
>>> obligations via electronic mail, unless otherwise agreed in writing 
>>> between the parties concerned.
>>> E-mail including attachments is susceptible to data corruption, 
>>> interruption, unauthorised amendment, tampering and viruses, and we 
>>> only send and receive e-mails on the basis that we are not liable 
>>> for any such corruption, interception, amendment, tampering or 
>>> viruses or any consequences thereof. /
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> ---
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list