[GE users] Sun Grid Engine 6.2 - ARCO dbwriter issue

Jana Olivova Jana.Olivova at Sun.COM
Fri Sep 5 18:00:17 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Karen,

See inline,

Karen Magee wrote:
> I'll try and answer both sets of questions
>
> On Fri, Sep 05, 2008 at 04:12:13PM +0200, Jana Olivova wrote:
>   
>> Hi Karen,
>>
>> Just to get things straight. You have realized that the 61u3 dbwriter 
>> had not been running for 25+  days? And you had realized it after 
>> performing the upgrade of the dbwriter to the 6.2?
>>
>>     
>
> We realized it while the upgrade was in progress.  
>   We did a cloned cluster install.  The SGE part was/is totally successful,
>   and the users moved over - all user jobs on the old installation were
>   completed.  We then went to change over to the the ARCO software. Since
>   all the "jobs" had been listed in the reports we ran, we decided to 
>   move forward with moving over to the newer ARCO as well (this was
>   proabaly the incorrect move because there still was a large reporting.processing
>   file...though the file  just had "host/reporting variables" records in it -
>   which we don't really care about for our processing.
>   
I think the upgrade went well, and even those values from the old 
reporting file were inserted. The dbwriter 6.2 is able to handle the 6.1 
reporting file, and since you were upgrading from 6.1u4, I don't  think 
there is a problem.
>   
>> What database are you using?
>>
>>     
>
> MySQL Ver 14.12 Distrib 5.0.45
>
>   
>> Is there anything useful in the dbwriter log file? 
>> ($SGE_ROOT/$SGE_CELL/spool/dbwriter/dbwriter.log), any errors, exceptions?
>>
>>     
> unfortunately, no...just was looks to me to be normal successful startup...
>
> 04/09/2008 11:14:06|dnode0-bkp.mayo.edu|.ReportingDBWriter.initLogging|I|Starting up dbwriter (Version 6.2) ---------------------------
> 04/09/2008 11:14:06|dnode0-bkp.mayo.edu|r.ReportingDBWriter.initialize|I|Connection to db jdbc:mysql://rcfclusterdb.mayo.edu:3306/arco
> 04/09/2008 11:14:06|dnode0-bkp.mayo.edu|r.ReportingDBWriter.initialize|I|Found database model version 8
> 04/09/2008 11:14:07|dnode0-bkp.mayo.edu|er.file.FileParser.processFile|I|Renaming reporting  to reporting.processing
> 04/09/2008 11:14:07|dnode0-bkp.mayo.edu|iter.file.FileParser.parseFile|W|0 lines marked as erroneous, these will be skipped
> 04/09/2008 11:14:07|dnode0-bkp.mayo.edu|tingDBWriter.getDbWriterConfig|I|calculation file /home/sge6_2/dbwriter/database/mysql/dbwriter.xml has changed, reread it
> 04/09/2008 11:14:13|dnode0-bkp.mayo.edu|ngDBWriter$StatisticThread.run|I|Next statistic calculation will be done at 9/4/08 12:14 PM
> 04/09/2008 11:14:31|dnode0-bkp.mayo.edu|rtingDBWriter.logEventDuration|I|calculating derived values took 0 hours 0 minutes
>   
Is this the end of the log, or is there more and you had just copied the 
snipplet? There should be also lines  in the log file about the deletion 
time 'deleting outdated values took X hours X minutes. Are these 
messages there? Yo can also check the in ARCo web console the 
Performance Query, which also shows the same information. Are there any 
lines in the log that say 'processed X   lines in X minutes' ?
>
>   
>> How have you figured out that it is taking a long time to remove the 
>> old  records?
>>
>>     
> just a guess by looking at the size of (count) of sge_host_values table
> It's decreasing...and the select command that PHP MySQL sees has that table
> in it..After running overnight, we've see a drop of 287,000 records in
> the sge_host_values record count...but I haven't seen any of the "new"
> data from jobs that have run in the last week or so show up yet..
>   
This is not an indication that anything is wrong. In the deletion rules 
file the deletion for some host_values is set to 7 days:  So, if the 
dbwriter was not running fro some 25 days it would after, restart delete 
lot of records at once after restart. See: 
http://wikis.sun.com/display/GridEngine/Derived+Values+and+Deletion+Rules. 
Are  there no new data being inserted in *any* tables? Check sge_job table.
>   
>> While  doing the ARCo upgrade have you chosen to use SMF support? (the 
>> last question during dbwriter install)
>>
>>     
> No
>
>   
>> Did you run the inst_dbwriter script with the -upd option, as specified 
>> in the documentation?
>>
>>     
> Yes
>
>   
>> 'If upgrading from version < 6.2, you must run the installations script 
>> with option -upd. This will remove existing RC scripts.'
>>
>> Jana
>>
>> Lubomir Petrik wrote:
>>     
>>> How large is the unprocessed reporting file? How long is it running? 
>>> Does some new old records appear in ARCo?
>>>
>>> Lubos.
>>>       
>
> Currently the engine is runnning and the file sizes are
>
> -rw-r--r--  1 sgeadmin sgeadmin  27109033 Sep  5 09:50 reporting
> -rw-r--r--  1 sgeadmin sgeadmin 360673112 Sep  4 11:13 reporting.processing
>
> [root at dnode0 common]# wc -l reporting*
>    235247 reporting
>   3138846 reporting.processing
>
> I'm concerned that I'm in a vicious cycle with the reporting file growing and
> the reporting.processing file not finishing up.....and when it does we'll
> be back to the same thing again..
>   
Hmm, if the log that you have showed me is the whole log, then it looks 
like dbwriter is stuck somewhere. Try stopping the dbwriter, increase 
the Debug level in dbwriter.conf file, and start again, see if there is 
anything else in  the log.

Jana
> --
> Karen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   




More information about the gridengine-users mailing list