[GE users] Arco tool results differ from qacct

Chansup Byun Chansup.Byun at Sun.COM
Mon May 21 16:51:44 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Jana,

I could be wrong but if I remember correctly the sge_job_usage table in 
ARCO  only stores jobs completed successfully.
However, qacct also stores jobs failed with errors.

Regards,

- Chansup

Jana Olivova wrote:
> Hi,
>
> I don't see anything wrong with the query. You can also use the 
> predefined Accounting per Department query, which does the same.
>
> I checked my setup with MySQL database and I get the same results with 
> both ARCo and qacct. I don't have any sensible data in my Postgres db, 
> because I was using the same grid with 3 different databases. So the 
> only month I can compare is is this one:
>
> qacct -b 200705010000 -e 200705312359
> Total System Usage
>     WALLCLOCK         UTIME         STIME           CPU             
> MEMORY                 IO                IOW
> ================================================================================================================
>        889909             2            36           415              
> 0.275              0.000              0.000
>
> ARCo Accounting per Department
>
> 2007-05-01
> cpu 	mem 	io
> defaultdepartment 	415.155821 	0.275125999999997 	0.0
>
>
> The one explanation for this, of course, would be if the same database 
> is used for more grids and/or (for February) that reporting was not 
> enabled the whole time. Not sure if that is a likely scenario for you.
>
> Regards,
>
> Jana
>
> John Mc-Nicholas XJ (GU/ETL) wrote:
>> Hi Jana/Daniel
>>
>> In this case I use database :sge_job_usage, but I have also used the
>> accounting database.
>> qacct groups jobs according to the jobs start time? I've done the same
>> for the SQL query.
>> So this SQL SHOULD TOTAL UP THE MEMORY GBS for all the jobs started
>> within each month.
>>
>>
>> SQL:
>> SELECT date_trunc('month', ju_start_time) AS month,
>> SUM (ju_mem) AS "mem
>> "  
>> FROM sge_job_usage 
>> WHERE ju_start_time  >  (current_timestamp - interval '1 year') 
>> GROUP BY month
>> ORDER BY month; 
>>
>> resulting table
>> month   			mem  
>>  
>> 2007-02-01 00:00:00.0 532138.750717 
>> 2007-03-01 00:00:00.0 5274933.144317 
>> 2007-04-01 00:00:00.0 6884688.555405 
>> 2007-05-01 00:00:00.0 2789895.540273 
>>
>> Here are the results from qacct command. Compare the MEMORY column to
>> table above.
>> The results differ by a significant amount. A query on ju_cpu results in
>> a similar discrepency.
>> qacct 
>> johnick at seasub1[~]# qacct -b 200702010000 -e 200702312359
>> Total System Usage
>>     WALLCLOCK         UTIME         STIME           CPU
>> MEMORY                 IO                IOW
>> ========================================================================
>> ========================================
>>       2433584        289462        131581        854446
>> 567582.583              0.000              0.000
>> johnick at seasub1[~]# qacct -b 200703010000 -e 200703312359
>> Total System Usage
>>     WALLCLOCK         UTIME         STIME           CPU
>> MEMORY                 IO                IOW
>> ========================================================================
>> ========================================
>>       4753132       1041297         53389       2957120
>> 3923641.991              0.000              0.000
>> johnick at seasub1[~]# qacct -b 200704010000 -e 200704312359
>> Total System Usage
>>     WALLCLOCK         UTIME         STIME           CPU
>> MEMORY                 IO                IOW
>> ========================================================================
>> ========================================
>>       6118415       2063020        140069       4094226
>> 5743492.079              0.000              0.000
>> johnick at seasub1[~]# qacct -b 200705010000 -e 200705312359
>> Total System Usage
>>     WALLCLOCK         UTIME         STIME           CPU
>> MEMORY                 IO                IOW
>> ========================================================================
>> ========================================
>>       2746486        983188        156462       1761848
>> 2388992.294              0.000              0.000
>>
>>
>>
>> -----Original Message-----
>> From: Jana.Olivova at Sun.COM [mailto:Jana.Olivova at Sun.COM] 
>> Sent: 18 May 2007 18:58
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Arco tool results differ from qacct
>>
>> I have problem replicating the issue, though. I keep running jobs (using
>> Maintrunk GE) and the numbers keep matching.
>>
>> Jana
>>
>> Daniel Templeton wrote:
>>   
>>> It may be worth noting that qacct and ARCo use different source data 
>>> files.  qacct uses the accounting file, and ARCo uses the reporting 
>>> file.  It is not inconceivable that there could be an issue such that 
>>> the qmaster might write different data to the two files in some cases.
>>>     
>>
>>   
>>> Just a thought.
>>>
>>> Daniel
>>>
>>> Jana Olivova wrote:
>>>     
>>>> Hi John,
>>>>
>>>> I could check on the Arco side. I have checked my data and they are 
>>>> both the same, except the rounding that appears in qacct. I do have, 
>>>> however, very small sample of data. Frankly, I am not sure what would
>>>>       
>>
>>   
>>>> cause this. Arco only inserts the data that is given to it by the 
>>>> qmaster, in the reporting file.
>>>>
>>>> Can you tell me what sql query did you use to obtain the data in ARCo
>>>>       
>>
>>   
>>>> and what database are you using?
>>>>
>>>> Jana Olivova
>>>>
>>>> John Mc-Nicholas XJ (GU/ETL) wrote:
>>>>       
>>>>> Hi All
>>>>>
>>>>> I am basically having the same problem that Todd Heywood had earlier
>>>>>         
>>
>>   
>>>>> in the year.
>>>>> He gave up on Arco tool in the end , I hope I haven't got to do the 
>>>>> same.
>>>>>
>>>>>         
>>>>>> / Heywood, Todd wrote:/ >/> How does ACRo report time and memory? I
>>>>>>           
>>>>> assumed it would be the same as/ >/> for qacct, for which it is 
>>>>> seconds and Gbytes (according to "man/ >/> accounting"). But qacct 
>>>>> and ACRo are reporting different numbers. Unit/ >/> conversions 
>>>>> don't account for the diffs/
>>>>>
>>>>> The Arco Tool produces nice graphs and the SQL works fine but when I
>>>>>         
>>
>>   
>>>>> compare to the output of QACCT , it is a completely different set of
>>>>>         
>>
>>   
>>>>> results.
>>>>>
>>>>> There is some correlation between the data. For example, Aprils 
>>>>> usage is the highest in both sets of results & The users with the 
>>>>> most usage also correspond in both sets of data.
>>>>> But the actual data seems to be randomly out by an order of 20-30%.
>>>>>
>>>>> I'm specifically trying to extract grid jobs memory (Gigabyte
>>>>> seconds) per month
>>>>> For example the data for April
>>>>> qacct -b 200704010000 -e 200704312359 MEMORY 5743492.079
>>>>>
>>>>> But the output in arco gives.........
>>>>> 6324866.240448
>>>>>
>>>>> Is this a bug in ARCO/GRID ?
>>>>> What would cause this behaviour?
>>>>>
>>>>> The only strange thing I've noticed is that I have 2 dbwriter 
>>>>> process instead of 1 & 5 postmaster instead of 3.
>>>>>
>>>>>
>>>>> sgeadm 1430 1422 0 May 10 ? 0:00 /bin/sh 
>>>>> /grid/dbwriter/util/dbwriter.sh sgeadm 1422 1 0 May 10 ? 0:00 
>>>>> /bin/sh /grid/dbwriter/util/dbwriter.sh postgres 1402 1401 0 May 10 
>>>>> ? 0:00 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database 
>>>>> -S postgres 1403 1402 0 May 10 ? 0:01 
>>>>> /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database -S 
>>>>> postgres 1401 1 0 May 10 ? 0:04 /usr/local/pgsql/bin/postmaster -D 
>>>>> /usr/local/pgsql/database -S postgres 13303 1401 0 16:29:34 ? 0:00 
>>>>> /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database -S 
>>>>> postgres 9719 1401 0 14:31:33 ? 0:20 /usr/local/pgsql/bin/postmaster
>>>>>         
>>
>>   
>>>>> -D /usr/local/pgsql/database -S
>>>>>
>>>>> If you've any ideas please get back to me & I'll give you more 
>>>>> detailed info.
>>>>>
>>>>> Best Regards
>>>>>
>>>>> John
>>>>> */ John Mc Nicholas /*
>>>>>
>>>>> * STE/SEA Support Engineer *
>>>>> * BETE Test Plants UK *
>>>>> E
>>>>>
>>>>> Phone: +44 (0) 1483 305458
>>>>> Email: john.xj.mc-nicholas at ericsson.com
>>>>> Address: Ericsson, Midleton Gate, Guildford Business Park, 
>>>>> Guildford, Surrey, GU2 8SG , UK
>>>>>
>>>>> / Ericsson Limited /
>>>>> / Registered Office: Unit 4, Midleton Gate, Guildford Business Park,
>>>>>         
>>
>>   
>>>>> Guildford, Surrey, GU2 8SG / / Registered Number in England and 
>>>>> Wales: 942215 / / This communication is confidential and intended 
>>>>> solely for the addressee(s). Any unauthorised review, use, 
>>>>> disclosure or distribution is prohibited. If you believe this 
>>>>> message has been sent to you in error, please notify the sender by 
>>>>> replying to this transmission and delete the message without 
>>>>> disclosing it. Thank you.
>>>>> Ericsson Limited does not enter into contracts or contractual 
>>>>> obligations via electronic mail, unless otherwise agreed in writing 
>>>>> between the parties concerned.
>>>>> E-mail including attachments is susceptible to data corruption, 
>>>>> interruption, unauthorised amendment, tampering and viruses, and we 
>>>>> only send and receive e-mails on the basis that we are not liable 
>>>>> for any such corruption, interception, amendment, tampering or 
>>>>> viruses or any consequences thereof. /
>>>>>
>>>>>
>>>>>
>>>>>         
>>>> ---------------------------------------------------------------------
>>>> ---
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>   
>>>>       
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>     
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>   
>
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list