[GE users] Sun Grid Engine 6.2 - ARCO dbwriter issue
magee at mayo.edu
Fri Sep 5 16:00:30 BST 2008
I'll try and answer both sets of questions
On Fri, Sep 05, 2008 at 04:12:13PM +0200, Jana Olivova wrote:
> Hi Karen,
> Just to get things straight. You have realized that the 61u3 dbwriter
> had not been running for 25+ days? And you had realized it after
> performing the upgrade of the dbwriter to the 6.2?
We realized it while the upgrade was in progress.
We did a cloned cluster install. The SGE part was/is totally successful,
and the users moved over - all user jobs on the old installation were
completed. We then went to change over to the the ARCO software. Since
all the "jobs" had been listed in the reports we ran, we decided to
move forward with moving over to the newer ARCO as well (this was
proabaly the incorrect move because there still was a large reporting.processing
file...though the file just had "host/reporting variables" records in it -
which we don't really care about for our processing.
> What database are you using?
MySQL Ver 14.12 Distrib 5.0.45
> Is there anything useful in the dbwriter log file?
> ($SGE_ROOT/$SGE_CELL/spool/dbwriter/dbwriter.log), any errors, exceptions?
unfortunately, no...just was looks to me to be normal successful startup...
04/09/2008 11:14:06|dnode0-bkp.mayo.edu|.ReportingDBWriter.initLogging|I|Starting up dbwriter (Version 6.2) ---------------------------
04/09/2008 11:14:06|dnode0-bkp.mayo.edu|r.ReportingDBWriter.initialize|I|Connection to db jdbc:mysql://rcfclusterdb.mayo.edu:3306/arco
04/09/2008 11:14:06|dnode0-bkp.mayo.edu|r.ReportingDBWriter.initialize|I|Found database model version 8
04/09/2008 11:14:07|dnode0-bkp.mayo.edu|er.file.FileParser.processFile|I|Renaming reporting to reporting.processing
04/09/2008 11:14:07|dnode0-bkp.mayo.edu|iter.file.FileParser.parseFile|W|0 lines marked as erroneous, these will be skipped
04/09/2008 11:14:07|dnode0-bkp.mayo.edu|tingDBWriter.getDbWriterConfig|I|calculation file /home/sge6_2/dbwriter/database/mysql/dbwriter.xml has changed, reread it
04/09/2008 11:14:13|dnode0-bkp.mayo.edu|ngDBWriter$StatisticThread.run|I|Next statistic calculation will be done at 9/4/08 12:14 PM
04/09/2008 11:14:31|dnode0-bkp.mayo.edu|rtingDBWriter.logEventDuration|I|calculating derived values took 0 hours 0 minutes
> How have you figured out that it is taking a long time to remove the
> old records?
just a guess by looking at the size of (count) of sge_host_values table
It's decreasing...and the select command that PHP MySQL sees has that table
in it..After running overnight, we've see a drop of 287,000 records in
the sge_host_values record count...but I haven't seen any of the "new"
data from jobs that have run in the last week or so show up yet..
> While doing the ARCo upgrade have you chosen to use SMF support? (the
> last question during dbwriter install)
> Did you run the inst_dbwriter script with the -upd option, as specified
> in the documentation?
> 'If upgrading from version < 6.2, you must run the installations script
> with option -upd. This will remove existing RC scripts.'
> Lubomir Petrik wrote:
> >How large is the unprocessed reporting file? How long is it running?
> >Does some new old records appear in ARCo?
Currently the engine is runnning and the file sizes are
-rw-r--r-- 1 sgeadmin sgeadmin 27109033 Sep 5 09:50 reporting
-rw-r--r-- 1 sgeadmin sgeadmin 360673112 Sep 4 11:13 reporting.processing
[root at dnode0 common]# wc -l reporting*
I'm concerned that I'm in a vicious cycle with the reporting file growing and
the reporting.processing file not finishing up.....and when it does we'll
be back to the same thing again..
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users