[GE users] Sun Grid Engine 6.2 - ARCO dbwriter issue

Jana Olivova Jana.Olivova at Sun.COM
Mon Sep 8 08:57:32 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Karen,

I am not sure why it is taking that long to delete the date, you 
obviously don't have too many old data in the database. Since you have 
done the cloned cluster upgrade, did you leave the old cluster running? 
You have realized that the (old) dbwriter was not running, which may 
mean that the process was stopped or that it was just not inserting in 
the database, because there was a connection error to the database. Have 
you actually stopped the old dbwriter, before installing the new one? 
There cannot be 2 dbwriter processed writing into the same database 
schema. Do you have the end of the old dbwriter log file?

At this point I would suggest the following:

1. on dbwriter host, source the cluster settings.sh (or .csh) file
2. stop the dbwriter:
3. comment out or delete (probably save them somewhere else for later) 
the deletion rules in the 
$SGE_ROOT/dbwriter/database/mysql/dbwriter.xml. Do not delete the file 
just get rid of the deletion rules. You can always perform the deletion 
of the outdated values yourself, later.
4. If you do not need the old reporting.processing file, since it 
contains just the host values, which are generated even if there are not 
any jobs running, just delete it.
5. change the debug level of dbwriter back to INFO. (increased debug 
level will also slowq down dbwriter)
6. start dbwriter.
7. You can monitor if dbwriter is inserting lines, by querying the 
sge_checkpoint table (select * from sge_checkpoint), the value (ch_line) 
should be increasing.

Hope that helps.

Jana

On 09/05/08 22:38, Karen Magee wrote:
> See inline ..
> -----------------
> On Fri, Sep 05, 2008 at 07:00:17PM +0200, Jana Olivova wrote:
>   
>>>> Is there anything useful in the dbwriter log file? 
>>>> ($SGE_ROOT/$SGE_CELL/spool/dbwriter/dbwriter.log), any errors, exceptions?
>>>>
>>>>    
>>>>         
>>> unfortunately, no...just was looks to me to be normal successful startup...
>>>
>>> 04/09/2008 
>>> 11:14:06|dnode0-bkp.mayo.edu|.ReportingDBWriter.initLogging|I|Starting up 
>>> dbwriter (Version 6.2) ---------------------------
>>> 04/09/2008 
>>> 11:14:06|dnode0-bkp.mayo.edu|r.ReportingDBWriter.initialize|I|Connection 
>>> to db jdbc:mysql://rcfclusterdb.mayo.edu:3306/arco
>>> 04/09/2008 
>>> 11:14:06|dnode0-bkp.mayo.edu|r.ReportingDBWriter.initialize|I|Found 
>>> database model version 8
>>> 04/09/2008 
>>> 11:14:07|dnode0-bkp.mayo.edu|er.file.FileParser.processFile|I|Renaming 
>>> reporting  to reporting.processing
>>> 04/09/2008 11:14:07|dnode0-bkp.mayo.edu|iter.file.FileParser.parseFile|W|0 
>>> lines marked as erroneous, these will be skipped
>>> 04/09/2008 
>>> 11:14:07|dnode0-bkp.mayo.edu|tingDBWriter.getDbWriterConfig|I|calculation 
>>> file /home/sge6_2/dbwriter/database/mysql/dbwriter.xml has changed, reread 
>>> it
>>> 04/09/2008 
>>> 11:14:13|dnode0-bkp.mayo.edu|ngDBWriter$StatisticThread.run|I|Next 
>>> statistic calculation will be done at 9/4/08 12:14 PM
>>> 04/09/2008 
>>> 11:14:31|dnode0-bkp.mayo.edu|rtingDBWriter.logEventDuration|I|calculating 
>>> derived values took 0 hours 0 minutes
>>>  
>>>       
>> Is this the end of the log, or is there more and you had just copied the 
>> snipplet? There should be also lines  in the log file about the deletion 
>> time 'deleting outdated values took X hours X minutes. Are these 
>> messages there? Yo can also check the in ARCo web console the 
>> Performance Query, which also shows the same information. Are there any 
>> lines in the log that say 'processed X   lines in X minutes' ?
>>     
>
> ..that is the entire log ...
> The dbwriter query yields no fields with 'processed X   lines in X minutes'
>
>   
>>>  
>>>       
>>>> How have you figured out that it is taking a long time to remove the 
>>>> old  records?
>>>>
>>>>    
>>>>         
>>> just a guess by looking at the size of (count) of sge_host_values table
>>> It's decreasing...and the select command that PHP MySQL sees has that table
>>> in it..After running overnight, we've see a drop of 287,000 records in
>>> the sge_host_values record count...but I haven't seen any of the "new"
>>> data from jobs that have run in the last week or so show up yet..
>>>  
>>>       
>> This is not an indication that anything is wrong. In the deletion rules 
>> file the deletion for some host_values is set to 7 days:  So, if the 
>> dbwriter was not running fro some 25 days it would after, restart delete 
>> lot of records at once after restart. See: 
>> http://wikis.sun.com/display/GridEngine/Derived+Values+and+Deletion+Rules. 
>> Are  there no new data being inserted in *any* tables? Check sge_job table.
>>     
>>>  
>>>       
>
> Nothing going into the sge_job table - But I I shouldn't expect it if
> it's still doing the cleanup before it starts looking at the reporting file...
>
> right?
>
>   
>>> -rw-r--r--  1 sgeadmin sgeadmin  27109033 Sep  5 09:50 reporting
>>> -rw-r--r--  1 sgeadmin sgeadmin 360673112 Sep  4 11:13 reporting.processing
>>>
>>> [root at dnode0 common]# wc -l reporting*
>>>   235247 reporting
>>>  3138846 reporting.processing
>>>
>>> I'm concerned that I'm in a vicious cycle with the reporting file growing 
>>> and
>>> the reporting.processing file not finishing up.....and when it does we'll
>>> be back to the same thing again..
>>>  
>>>       
>> Hmm, if the log that you have showed me is the whole log, then it looks 
>> like dbwriter is stuck somewhere. Try stopping the dbwriter, increase 
>> the Debug level in dbwriter.conf file, and start again, see if there is 
>> anything else in  the log.
>>
>>     
> I've restarted with debugging...it's working on stuff...this query over and
> over...about 1.5 minutes a piece..
>
> 05/09/2008 15:33:07|dnode0-bkp.mayo.edu|riter.db.Database.executeQuery|D|Execute sql: SELECT hv_id FROM sge_host_values WHERE hv_time_end < {ts '2008-08-15 11:00:00.0'} AND hv_variable IN ('np_load_avg', 'cpu', 'mem_free', 'virtual_free') limit 500
> 05/09/2008 15:34:35|dnode0-bkp.mayo.edu|iter.db.Database.executeUpdate|D|Execute sql: DELETE FROM sge_host_values WHERE hv_id IN (115390991,115390993,115390997,115391000,115391004,115391006,115391010,115391012,115391016,115391020,115391024,115391028,115391032,115391037,115391041,115391043,115391047,115391050,115391054,115391057,115391061,115391063,115391067,115391070,115391074,115391076,115391080,115391082,115391086,115391088,115391092,115391097,115391101,115391103,115391107,115391109,115391113,115391115,115391119,115391121,115391125,115391127,115391131,115391135,115391139,115391141,115391145,115391147,115391151,115391153,115391157,115391160,115391164,115391166,115391170,115391174,115391178,115391180,115391184,115391187,115391191,115391194,115391198,115391200,115391204,115391206,115391210,115391213,115391217,115391219,115391223,115391225,115391229,115391233,115391237,115391240,115391244,115391246,115391250,115391253,115391257,115391260,115391264,115391266,115391270,115391272,115391276,115391278,115391282,115391284,115391288,115391290,115391294,115391297,115391301,115391303,115391307,115391309,115391313,115391316,115391320,115391322,115391326,115391328,115391332,115391334,115391338,115391340,115391344,115391346,115391350,115391353,115391355,115391357,115391361,115391363,115391367,115391369,115391373,115391375,115391379,115391381,115391385,115391387,115391391,115391393,115391397,115391399,115391403,115391405,115391409,115391412,115391416,115391418,115391422,115391424,115391428,115391430,115391434,115391437,115391441,115391443,115391447,115391449,115391453,115391455,115391459,115391461,115391465,115391467,115391471,115391473,115391477,115391479,115391483,115391485,115391489,115391491,115391495,115391497,115391501,115391503,115391507,115391510,115391514,115391516,115391520,115391526,115391530,115391534,115391538,115391541,115391545,115391547,115391551,115391554,115391558,115391561,115391565,115391567,115391571,115391574,115391578,115391580,115391584,115391586,115391590,115391592,115391596,115391601,115391605,115391607,115391611,115391613,115391617,115391619,115391623,115391625,115391629,115391631,115391635,115391639,115391643,115391645,115391649,115391651,115391655,115391657,115391661,115391664,115391668,115391671,115391675,115391677,115391681,115391683,115391687,115391691,115391695,115391697,115391701,115391703,115391707,115391710,115391714,115391716,115391720,115391723,115391727,115391729,115391733,115391737,115391741,115391744,115391748,115391750,115391754,115391756,115391760,115391763,115391767,115391770,115391774,115391776,115391780,115391782,115391786,115391789,115391793,115391795,115391799,115391801,115391805,115391807,115391811,115391813,115391817,115391820,115391824,115391826,115391830,115391832,115391836,115391838,115391842,115391844,115391848,115391850,115391854,115391858,115391860,115391861,115391865,115391867,115391871,115391873,115391877,115391879,115391883,115391885,115391889,115391891,115391895,115391897,115391901,115391903,115391907,115391909,115391913,115391916,115391920,115391922,115391926,115391928,115391932,115391934,115391938,115391941,115391945,115391947,115391951,115391953,115391957,115391959,115391963,115391965,115391969,115391971,115391975,115391977,115391981,115391983,115391987,115391989,115391993,115391995,115391999,115392001,115392005,115392007,115392011,115392014,115392018,115392020,115392024,115392030,115392034,115392038,115392042,115392045,115392049,115392051,115392055,115392059,115392063,115392065,115392069,115392071,115392075,115392077,115392081,115392084,115392088,115392090,115392094,115392096,115392100,115392104,115392108,115392111,115392115,115392117,115392121,115392123,115392127,115392130,115392134,115392136,115392140,115392143,115392147,115392149,115392153,115392155,115392159,115392161,115392165,115392169,115392173,115392175,115392179,115392181,115392185,115392187,115392191,115392195,115392199,115392201,115392205,115392207,115392211,115392214,115392218,115392220,115392224,115392227,115392231,115392233,115392237,115392241,115392245,115392248,115392252,115392255,115392259,115392261,115392265,115392267,115392271,115392274,115392278,115392280,115392284,115392286,115392290,115392293,115392297,115392299,115392303,115392305,115392309,115392311,115392315,115392317,115392321,115392324,115392328,115392330,115392334,115392336,115392340,115392342,115392346,115392348,115392352,115392354,115392358,115392362,115392364,115392365,115392369,115392371,115392375,115392377,115392381,115392383,115392387,115392389,115392393,115392395,115392399,115392401,115392405,115392407,115392411,115392413,115392417,115392419,115392423,115392425,115392429,115392432,115392436,115392438,115392442,115392445,115392449,115392451,115392455,115392457,115392461,115392463,115392467,115392469,115392473,115392475,115392479,115392481,115392485,115392487,115392491,115392493,115392497,115392499,115392503,115392505,115392509,115392512,115392516,115392518,115392522,115392524,115392528,115392534,115392538,115392543,115392547,115392549,115392553,115392555,115392559,115392563,115392567,115392570,115392574,115392576,115392580,115392582,115392586,115392588,0)
> 05/09/2008 15:34:35|dnode0-bkp.mayo.edu|ng.dbwriter.db.Database.commit|D|Thread derived commits Connection 2 (null at jdbc:mysql://rcfclusterdb.mayo.edu:3306/arco)
> 05/09/2008 15:34:35|dnode0-bkp.mayo.edu|riter.db.Database.executeQuery|D|Execute sql: SELECT hv_id FROM sge_host_values WHERE hv_time_end < {ts '2008-08-15 11:00:00.0'} AND hv_variable IN ('np_load_avg', 'cpu', 'mem_free', 'virtual_free') limit 500
>
>
>
>
>
>   



    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list