[GE users] Arco tool results differ from qacct

Jana Olivova Jana.Olivova at Sun.COM
Mon May 21 18:49:46 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Chansup,

Hmm, it does not look like that. The table sge_job_usage has fields 
ju_failed, ju_exit_status and jobs that have different exit status than 
0 are recorded and the view_accounting does not filter those out.

Jana

Chansup Byun wrote:
> Hi Jana,
>
> I could be wrong but if I remember correctly the sge_job_usage table 
> in ARCO  only stores jobs completed successfully.
> However, qacct also stores jobs failed with errors.
>
> Regards,
>
> - Chansup
>
> Jana Olivova wrote:
>> Hi,
>>
>> I don't see anything wrong with the query. You can also use the 
>> predefined Accounting per Department query, which does the same.
>>
>> I checked my setup with MySQL database and I get the same results 
>> with both ARCo and qacct. I don't have any sensible data in my 
>> Postgres db, because I was using the same grid with 3 different 
>> databases. So the only month I can compare is is this one:
>>
>> qacct -b 200705010000 -e 200705312359
>> Total System Usage
>>     WALLCLOCK         UTIME         STIME           CPU             
>> MEMORY                 IO                IOW
>> ================================================================================================================ 
>>
>>        889909             2            36           415              
>> 0.275              0.000              0.000
>>
>> ARCo Accounting per Department
>>
>> 2007-05-01
>> cpu     mem     io
>> defaultdepartment     415.155821     0.275125999999997     0.0
>>
>>
>> The one explanation for this, of course, would be if the same 
>> database is used for more grids and/or (for February) that reporting 
>> was not enabled the whole time. Not sure if that is a likely scenario 
>> for you.
>>
>> Regards,
>>
>> Jana
>>
>> John Mc-Nicholas XJ (GU/ETL) wrote:
>>> Hi Jana/Daniel
>>>
>>> In this case I use database :sge_job_usage, but I have also used the
>>> accounting database.
>>> qacct groups jobs according to the jobs start time? I've done the same
>>> for the SQL query.
>>> So this SQL SHOULD TOTAL UP THE MEMORY GBS for all the jobs started
>>> within each month.
>>>
>>>
>>> SQL:
>>> SELECT date_trunc('month', ju_start_time) AS month,
>>> SUM (ju_mem) AS "mem
>>> "  FROM sge_job_usage WHERE ju_start_time  >  (current_timestamp - 
>>> interval '1 year') GROUP BY month
>>> ORDER BY month;
>>> resulting table
>>> month               mem   
>>> 2007-02-01 00:00:00.0 532138.750717 2007-03-01 00:00:00.0 
>>> 5274933.144317 2007-04-01 00:00:00.0 6884688.555405 2007-05-01 
>>> 00:00:00.0 2789895.540273
>>> Here are the results from qacct command. Compare the MEMORY column to
>>> table above.
>>> The results differ by a significant amount. A query on ju_cpu 
>>> results in
>>> a similar discrepency.
>>> qacct johnick at seasub1[~]# qacct -b 200702010000 -e 200702312359
>>> Total System Usage
>>>     WALLCLOCK         UTIME         STIME           CPU
>>> MEMORY                 IO                IOW
>>> ======================================================================== 
>>>
>>> ========================================
>>>       2433584        289462        131581        854446
>>> 567582.583              0.000              0.000
>>> johnick at seasub1[~]# qacct -b 200703010000 -e 200703312359
>>> Total System Usage
>>>     WALLCLOCK         UTIME         STIME           CPU
>>> MEMORY                 IO                IOW
>>> ======================================================================== 
>>>
>>> ========================================
>>>       4753132       1041297         53389       2957120
>>> 3923641.991              0.000              0.000
>>> johnick at seasub1[~]# qacct -b 200704010000 -e 200704312359
>>> Total System Usage
>>>     WALLCLOCK         UTIME         STIME           CPU
>>> MEMORY                 IO                IOW
>>> ======================================================================== 
>>>
>>> ========================================
>>>       6118415       2063020        140069       4094226
>>> 5743492.079              0.000              0.000
>>> johnick at seasub1[~]# qacct -b 200705010000 -e 200705312359
>>> Total System Usage
>>>     WALLCLOCK         UTIME         STIME           CPU
>>> MEMORY                 IO                IOW
>>> ======================================================================== 
>>>
>>> ========================================
>>>       2746486        983188        156462       1761848
>>> 2388992.294              0.000              0.000
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Jana.Olivova at Sun.COM [mailto:Jana.Olivova at Sun.COM] Sent: 18 
>>> May 2007 18:58
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Arco tool results differ from qacct
>>>
>>> I have problem replicating the issue, though. I keep running jobs 
>>> (using
>>> Maintrunk GE) and the numbers keep matching.
>>>
>>> Jana
>>>
>>> Daniel Templeton wrote:
>>>  
>>>> It may be worth noting that qacct and ARCo use different source 
>>>> data files.  qacct uses the accounting file, and ARCo uses the 
>>>> reporting file.  It is not inconceivable that there could be an 
>>>> issue such that the qmaster might write different data to the two 
>>>> files in some cases.
>>>>     
>>>
>>>  
>>>> Just a thought.
>>>>
>>>> Daniel
>>>>
>>>> Jana Olivova wrote:
>>>>    
>>>>> Hi John,
>>>>>
>>>>> I could check on the Arco side. I have checked my data and they 
>>>>> are both the same, except the rounding that appears in qacct. I do 
>>>>> have, however, very small sample of data. Frankly, I am not sure 
>>>>> what would
>>>>>       
>>>
>>>  
>>>>> cause this. Arco only inserts the data that is given to it by the 
>>>>> qmaster, in the reporting file.
>>>>>
>>>>> Can you tell me what sql query did you use to obtain the data in ARCo
>>>>>       
>>>
>>>  
>>>>> and what database are you using?
>>>>>
>>>>> Jana Olivova
>>>>>
>>>>> John Mc-Nicholas XJ (GU/ETL) wrote:
>>>>>      
>>>>>> Hi All
>>>>>>
>>>>>> I am basically having the same problem that Todd Heywood had earlier
>>>>>>         
>>>
>>>  
>>>>>> in the year.
>>>>>> He gave up on Arco tool in the end , I hope I haven't got to do 
>>>>>> the same.
>>>>>>
>>>>>>        
>>>>>>> / Heywood, Todd wrote:/ >/> How does ACRo report time and memory? I
>>>>>>>           
>>>>>> assumed it would be the same as/ >/> for qacct, for which it is 
>>>>>> seconds and Gbytes (according to "man/ >/> accounting"). But 
>>>>>> qacct and ACRo are reporting different numbers. Unit/ >/> 
>>>>>> conversions don't account for the diffs/
>>>>>>
>>>>>> The Arco Tool produces nice graphs and the SQL works fine but when I
>>>>>>         
>>>
>>>  
>>>>>> compare to the output of QACCT , it is a completely different set of
>>>>>>         
>>>
>>>  
>>>>>> results.
>>>>>>
>>>>>> There is some correlation between the data. For example, Aprils 
>>>>>> usage is the highest in both sets of results & The users with the 
>>>>>> most usage also correspond in both sets of data.
>>>>>> But the actual data seems to be randomly out by an order of 20-30%.
>>>>>>
>>>>>> I'm specifically trying to extract grid jobs memory (Gigabyte
>>>>>> seconds) per month
>>>>>> For example the data for April
>>>>>> qacct -b 200704010000 -e 200704312359 MEMORY 5743492.079
>>>>>>
>>>>>> But the output in arco gives.........
>>>>>> 6324866.240448
>>>>>>
>>>>>> Is this a bug in ARCO/GRID ?
>>>>>> What would cause this behaviour?
>>>>>>
>>>>>> The only strange thing I've noticed is that I have 2 dbwriter 
>>>>>> process instead of 1 & 5 postmaster instead of 3.
>>>>>>
>>>>>>
>>>>>> sgeadm 1430 1422 0 May 10 ? 0:00 /bin/sh 
>>>>>> /grid/dbwriter/util/dbwriter.sh sgeadm 1422 1 0 May 10 ? 0:00 
>>>>>> /bin/sh /grid/dbwriter/util/dbwriter.sh postgres 1402 1401 0 May 
>>>>>> 10 ? 0:00 /usr/local/pgsql/bin/postmaster -D 
>>>>>> /usr/local/pgsql/database -S postgres 1403 1402 0 May 10 ? 0:01 
>>>>>> /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database -S 
>>>>>> postgres 1401 1 0 May 10 ? 0:04 /usr/local/pgsql/bin/postmaster 
>>>>>> -D /usr/local/pgsql/database -S postgres 13303 1401 0 16:29:34 ? 
>>>>>> 0:00 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/database 
>>>>>> -S postgres 9719 1401 0 14:31:33 ? 0:20 
>>>>>> /usr/local/pgsql/bin/postmaster
>>>>>>         
>>>
>>>  
>>>>>> -D /usr/local/pgsql/database -S
>>>>>>
>>>>>> If you've any ideas please get back to me & I'll give you more 
>>>>>> detailed info.
>>>>>>
>>>>>> Best Regards
>>>>>>
>>>>>> John
>>>>>> */ John Mc Nicholas /*
>>>>>>
>>>>>> * STE/SEA Support Engineer *
>>>>>> * BETE Test Plants UK *
>>>>>> E
>>>>>>
>>>>>> Phone: +44 (0) 1483 305458
>>>>>> Email: john.xj.mc-nicholas at ericsson.com
>>>>>> Address: Ericsson, Midleton Gate, Guildford Business Park, 
>>>>>> Guildford, Surrey, GU2 8SG , UK
>>>>>>
>>>>>> / Ericsson Limited /
>>>>>> / Registered Office: Unit 4, Midleton Gate, Guildford Business Park,
>>>>>>         
>>>
>>>  
>>>>>> Guildford, Surrey, GU2 8SG / / Registered Number in England and 
>>>>>> Wales: 942215 / / This communication is confidential and intended 
>>>>>> solely for the addressee(s). Any unauthorised review, use, 
>>>>>> disclosure or distribution is prohibited. If you believe this 
>>>>>> message has been sent to you in error, please notify the sender 
>>>>>> by replying to this transmission and delete the message without 
>>>>>> disclosing it. Thank you.
>>>>>> Ericsson Limited does not enter into contracts or contractual 
>>>>>> obligations via electronic mail, unless otherwise agreed in 
>>>>>> writing between the parties concerned.
>>>>>> E-mail including attachments is susceptible to data corruption, 
>>>>>> interruption, unauthorised amendment, tampering and viruses, and 
>>>>>> we only send and receive e-mails on the basis that we are not 
>>>>>> liable for any such corruption, interception, amendment, 
>>>>>> tampering or viruses or any consequences thereof. /
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>> ---------------------------------------------------------------------
>>>>> ---
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>         
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>     
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>   
>>
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list