[GE users] ARCo dbwriter not deleting outdated records

jana jana.olivova at sun.com
Mon Aug 31 09:44:32 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

The deletion should also start as normally, but it depends on how much
data is in the reporting file. The dbwriter takes the timestamp from the
line of the reporting file. So if for example the dbwriter was not
inserting for a few days and there is lot of data that needs to be
processed, it may take a while before the deletion kicks in, as the the
timestamp of the data being inserted is not old enough. It does not
delete the data by the calendar date, but by the timestamp that is
written in the reporting file. So, data being inserted today, might have
timestamp 2 or 3 days old and therefore it is not a time to delete
anything because the oldest data in the database is not 7 days alder
then this timestamp.  Hope that makes sense.

The other way when derived values and deletion rules are run is always
at the start up of dbwriter. Also, the deletion is not run until the
derived values are computed for the given period.

So, it is hard to say, there could be a bug, but it is also possible
that the deletion rules would be performed again even if you did not
restart it.

What concerns me more is why is the connection failing.

Jana

On 08/31/09 09:52, rafaarco wrote:
> Hi,
>
> Thanks for the advice, Jana. I sort of know what happened now.
> dbwriter.log shows this:
>
> 13/08/2009 13:55:12|grgen002|.ReportingDBWriter.initLogging|I|Starting
> up dbwriter (Version 6.2u3) ---------------------------
> 13/08/2009 13:55:12|grgen002|r.ReportingDBWriter.initialize|I|Connection
> to db jdbc:oracle:thin:@150.214.20.61:1711:e
> 13/08/2009 13:55:12|grgen002|r.ReportingDBWriter.initialize|I|Found
> database model version 10
> 13/08/2009 13:55:12|grgen002|tingDBWriter.getDbWriterConfig|I|
> calculation
> file /usr/local/apps/n1ge6_2_ib/dbwriter/database/oracle/dbwriter.xml
> has changed, reread it
> 13/08/2009 13:55:12|grgen002|er.file.FileParser.processFile|I|Renaming
> reporting  to reporting.processing
> 13/08/2009 13:55:12|grgen002|iter.file.FileParser.parseFile|W|0 lines
> marked as erroneous, these will be skipped
> 13/08/2009 13:55:13|grgen002|ngDBWriter$StatisticThread.run|I|Next
> statistic calculation will be done at 13/08/09 14:55
> 13/08/2009 13:55:37|grgen002|rtingDBWriter.logEventDuration|I|
> calculating derived values took 0 hours 0 minutes
> 13/08/2009 13:55:38|grgen002|rtingDBWriter.logEventDuration|I|deleting
> outdated values took 0 hours 0 minutes
>
> [...]
>
> 15/08/2009 01:11:23|grgen002|rtingDBWriter.logEventDuration|I|
> calculating derived values took 0 hours 0 minutes
> 15/08/2009 01:11:30|grgen002|rtingDBWriter.logEventDuration|I|deleting
> outdated values took 0 hours 0 minutes
> 15/08/2009 01:11:30|grgen002|BWriter$DerivedValueThread.run|I|Next
> regular task (derived values and delete) will be done at 15/08/09 2:11
>
> Everything OK until here. Nothing is deleted since nothing is old
> enough, and the deletion task runs every hour. But then:
>
> 15/08/2009 02:01:50|grgen002|bwriter.ReportingException.log|E|
> Database.closeFailed
>
> com.sun.grid.reporting.dbwriter.ReportingException: Database.closeFailed
>
> com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:354)
>
> com.sun.grid.reporting.dbwriter.db.Database.executeQuery(Database.java:568)
>
> com.sun.grid.reporting.dbwriter.db.Database.test(Database.java:334)
>
> com.sun.grid.reporting.dbwriter.ReportingDBWriter.mainLoop(ReportingDBWriter.java:1129)
>
> com.sun.grid.reporting.dbwriter.ReportingDBWriter.run(ReportingDBWriter.java:1279)
>                               Caused by java.sql.SQLException: No hay
> más datos para leer del socket
>
> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
>
> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179)
>
> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160)
>
> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:962)
>
> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:894)
>
> oracle.jdbc.ttc7.Ocommoncall.receive(Ocommoncall.java:106)
>
> oracle.jdbc.ttc7.TTC7Protocol.logoff(TTC7Protocol.java:396)
>
> oracle.jdbc.driver.OracleConnection.close(OracleConnection.java:1524)
>
> com.sun.grid.reporting.dbwriter.db.Database
> $ConnectionProxy.close(Database.java:721)
>
> com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:352)
> 15/08/2009 02:01:50|grgen002|bwriter.ReportingException.log|E|
> Database.closeFailed
>
> com.sun.grid.reporting.dbwriter.ReportingException: Database.closeFailed
>
> com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:359)
>
> com.sun.grid.reporting.dbwriter.db.Database.closeAll(Database.java:486)
>
> com.sun.grid.reporting.dbwriter.ReportingDBWriter.mainLoop(ReportingDBWriter.java:1133)
>
> com.sun.grid.reporting.dbwriter.ReportingDBWriter.run(ReportingDBWriter.java:1279)
>                               Caused by java.sql.SQLException: No hay
> más datos para leer del socket
>
> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
>
> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179)
>
> oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160)
>
> oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:962)
>
> oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:894)
>
> oracle.jdbc.ttc7.Ocommoncall.receive(Ocommoncall.java:106)
>
> oracle.jdbc.ttc7.TTC7Protocol.logoff(TTC7Protocol.java:396)
>
> oracle.jdbc.driver.OracleConnection.close(OracleConnection.java:1524)
>
> com.sun.grid.reporting.dbwriter.db.Database
> $ConnectionProxy.close(Database.java:721)
>
> com.sun.grid.reporting.dbwriter.db.Database.closeAll(Database.java:483)
>
> 15/08/2009 02:01:50|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
> Error - Trying to reconnect
>
> [...]
>
> 15/08/2009 02:10:52|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
> Error - Trying to reconnect
> 15/08/2009 02:11:00|grgen002|BWriter$DerivedValueThread.run|E|Unknown
> error: com.sun.grid.reporting.dbwriter.ReportingException: Can not
> connect to database jdbc:oracle:thin:@15
> 0.214.20.61:1711:e: ORA-01034: ORACLE not available
> ORA-27101: shared memory realm does not exist
> SVR4 Error: 2: No such file or directory
>
>
> com.sun.grid.reporting.dbwriter.ReportingException: Can not connect to
> database jdbc:oracle:thin:@150.214.20.61:1711:e: ORA-01034: ORACLE not
> avail
> able
> ORA-27101: shared memory realm does not exist
> SVR4 Error: 2: No such file or directory
>
>
> com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:359)
>
> com.sun.grid.reporting.dbwriter.db.Database.getConnection(Database.java:387)
>
> com.sun.grid.reporting.dbwriter.ReportingDBWriter
> $DerivedValueThread.run(ReportingDBWriter.java:833)
>                               Caused by java.sql.SQLException:
> ORA-01034: ORACLE not available
> ORA-27101: shared memory realm does not exist
> SVR4 Error: 2: No such file or directory
>
>
> oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
>
> oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:289)
>
> oracle.jdbc.ttc7.O3log.receive1st(O3log.java:410)
>
> oracle.jdbc.ttc7.TTC7Protocol.logon(TTC7Protocol.java:260)
>
> oracle.jdbc.driver.OracleConnection.<init>(OracleConnection.java:371)
>
> oracle.jdbc.driver.OracleDriver.getConnectionInstance(OracleDriver.java:551)
>
> oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:351)
>
> java.sql.DriverManager.getConnection(DriverManager.java:582)
>
> java.sql.DriverManager.getConnection(DriverManager.java:154)
>
> com.sun.grid.reporting.dbwriter.db.Database.getConnection(Database.java:381)
> 15/08/2009 02:11:12|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
> Error - Trying to reconnect
>
> [...]
>
> 15/08/2009 02:14:33|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
> Error - Trying to reconnect
> 15/08/2009 02:14:53|grgen002|er.file.FileParser.processFile|I|Renaming
> reporting  to reporting.processing
> 15/08/2009 02:14:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
> marked as erroneous, these will be skipped
> 15/08/2009 02:15:01|grgen002|iter.file.FileParser.parseFile|I|Deleting
> file reporting.processing
> 15/08/2009 02:15:01|grgen002|le.FileParser.createStatistics|I|Processed
> 3.765 lines in 7,65s (492,03 lines/s)
> 15/08/2009 02:15:53|grgen002|er.file.FileParser.processFile|I|Renaming
> reporting  to reporting.processing
> 15/08/2009 02:15:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
> marked as erroneous, these will be skipped
> 15/08/2009 02:15:53|grgen002|iter.file.FileParser.parseFile|I|Deleting
> file reporting.processing
> 15/08/2009 02:15:53|grgen002|le.FileParser.createStatistics|I|Processed
> 372 lines in 0,52s (718,15 lines/s)
> 15/08/2009 02:16:53|grgen002|er.file.FileParser.processFile|I|Renaming
> reporting  to reporting.processing
> 15/08/2009 02:16:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
> marked as erroneous, these will be skipped
> 15/08/2009 02:16:53|grgen002|iter.file.FileParser.parseFile|I|Deleting
> file reporting.processing
> 15/08/2009 02:16:53|grgen002|le.FileParser.createStatistics|I|Processed
> 172 lines in 0,35s (491,43 lines/s)
> 15/08/2009 02:17:53|grgen002|er.file.FileParser.processFile|I|Renaming
> reporting  to reporting.processing
> 15/08/2009 02:17:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
> marked as erroneous, these will be skipped
> 15/08/2009 02:17:55|grgen002|iter.file.FileParser.parseFile|I|Deleting
> file reporting.processing
>
> [...]
>
> So Oracle database wasn't available for a while and dbwriter tried to
> reconnect. It succeeded and the accounting took place again normally.
> However, the module that deletes the outdated values never ran again
> (or, at least, it failed to delete the outdated values and write the log
> entry). Shouldn't deletion get back as well as the rest of dbwriter
> does?
>
> After seeing that, I tried restarting dbwriter. At first it seemed to be
> useless, because after some hours no deletion had been performed. But I
> checked to database again and now the minimum value was:
>
> 2009-08-24 01:00:00.0
>
> I checked the log again and there it was:
>
> 30/08/2009 19:08:31|grgen002|rtingDBWriter.logEventDuration|I|deleting
> outdated values took 3 hours 19 minutes
> 30/08/2009 19:09:21|grgen002|rtingDBWriter.logEventDuration|I|
> calculating derived values took 0 hours 0 minutes
>
> I restarted dbwriter on Saturday, and the first deletion occurred on
> Sunday at 7 pm. This too much time, isn't it?
>
> Now the connection failed again at 2 am as usual and the deletion module
> is not being run. So as a workaround I guess we will have to restart the
> process every day until the deletion module properly recovers as the
> other module of dbwriter do.
>
> Thanks again and I hope there is a solution for this that doesn't need a
> restart.
>
> Regards,
>
> Rafa
>
> El vie, 28-08-2009 a las 15:20 +0200, jana escribió:
>
>> Hi,
>>
>> First, check if this is the case with all the tables. Then check if in
>> the dbwriter.log ($SGE_ROOT/$SGE_CELL/spool/dbwriter/dbwriter.log) are
>> lines starting: 'deleting outdated values '. Also, check the log for any
>> exceptions.
>>
>> And of course, check in few hours if the min value is really still the same.
>>
>> Regards,
>>
>> Jana
>>
>> rafaarco wrote:
>>
>>> I forgot: we're using SGE 6.2, ARCo 6.2u3 (this combination works
>>> flawlessly with the exception of the outdated records) and Oracle 9i.
>>>
>>> Regards,
>>>
>>> Rafa
>>>
>>> El vie, 28-08-2009 a las 14:14 +0200, Rafael Arco Arredondo escribió:
>>>
>>>
>>>> Hello,
>>>>
>>>> We have an issue with outdated records in ARCo database. Our
>>>> configuration is the default one: raw host values are kept for 7 days
>>>> and dbwriter runs in continuous mode (i.e., deletion should happen once
>>>> in an hour). However, the query
>>>>
>>>> SELECT MIN(hv_time_start) FROM sge_host_values WHERE hv_variable = 'cpu'
>>>>
>>>> returns
>>>>
>>>> 2009-08-13 11:39:37.0
>>>>
>>>> And the table sge_host_values keeps growing.
>>>>
>>>> Since today is August 28th, I expected this minimum value to be
>>>> 2009-08-21. So it's seems that old rows are not deleted.
>>>>
>>>> Any ideas?
>>>>
>>>> Thank you in advance,
>>>>
>>>> Rafa
>>>>
>>>>
>>>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214744
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
HI

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=215123

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list