[GE users] ARCo dbwriter not deleting outdated records

rafaarco rafaarco at ugr.es
Mon Aug 31 08:52:08 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Thanks for the advice, Jana. I sort of know what happened now.
dbwriter.log shows this:

13/08/2009 13:55:12|grgen002|.ReportingDBWriter.initLogging|I|Starting
up dbwriter (Version 6.2u3) ---------------------------
13/08/2009 13:55:12|grgen002|r.ReportingDBWriter.initialize|I|Connection
to db jdbc:oracle:thin:@150.214.20.61:1711:e
13/08/2009 13:55:12|grgen002|r.ReportingDBWriter.initialize|I|Found
database model version 10
13/08/2009 13:55:12|grgen002|tingDBWriter.getDbWriterConfig|I|
calculation
file /usr/local/apps/n1ge6_2_ib/dbwriter/database/oracle/dbwriter.xml
has changed, reread it
13/08/2009 13:55:12|grgen002|er.file.FileParser.processFile|I|Renaming
reporting  to reporting.processing
13/08/2009 13:55:12|grgen002|iter.file.FileParser.parseFile|W|0 lines
marked as erroneous, these will be skipped
13/08/2009 13:55:13|grgen002|ngDBWriter$StatisticThread.run|I|Next
statistic calculation will be done at 13/08/09 14:55
13/08/2009 13:55:37|grgen002|rtingDBWriter.logEventDuration|I|
calculating derived values took 0 hours 0 minutes
13/08/2009 13:55:38|grgen002|rtingDBWriter.logEventDuration|I|deleting
outdated values took 0 hours 0 minutes

[...]

15/08/2009 01:11:23|grgen002|rtingDBWriter.logEventDuration|I|
calculating derived values took 0 hours 0 minutes
15/08/2009 01:11:30|grgen002|rtingDBWriter.logEventDuration|I|deleting
outdated values took 0 hours 0 minutes
15/08/2009 01:11:30|grgen002|BWriter$DerivedValueThread.run|I|Next
regular task (derived values and delete) will be done at 15/08/09 2:11

Everything OK until here. Nothing is deleted since nothing is old
enough, and the deletion task runs every hour. But then:

15/08/2009 02:01:50|grgen002|bwriter.ReportingException.log|E|
Database.closeFailed

com.sun.grid.reporting.dbwriter.ReportingException: Database.closeFailed

com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:354)

com.sun.grid.reporting.dbwriter.db.Database.executeQuery(Database.java:568)

com.sun.grid.reporting.dbwriter.db.Database.test(Database.java:334)

com.sun.grid.reporting.dbwriter.ReportingDBWriter.mainLoop(ReportingDBWriter.java:1129)

com.sun.grid.reporting.dbwriter.ReportingDBWriter.run(ReportingDBWriter.java:1279)
                              Caused by java.sql.SQLException: No hay
más datos para leer del socket

oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)

oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179)

oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160)

oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:962)

oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:894)

oracle.jdbc.ttc7.Ocommoncall.receive(Ocommoncall.java:106)

oracle.jdbc.ttc7.TTC7Protocol.logoff(TTC7Protocol.java:396)

oracle.jdbc.driver.OracleConnection.close(OracleConnection.java:1524)

com.sun.grid.reporting.dbwriter.db.Database
$ConnectionProxy.close(Database.java:721)

com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:352)
15/08/2009 02:01:50|grgen002|bwriter.ReportingException.log|E|
Database.closeFailed

com.sun.grid.reporting.dbwriter.ReportingException: Database.closeFailed

com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:359)

com.sun.grid.reporting.dbwriter.db.Database.closeAll(Database.java:486)

com.sun.grid.reporting.dbwriter.ReportingDBWriter.mainLoop(ReportingDBWriter.java:1133)

com.sun.grid.reporting.dbwriter.ReportingDBWriter.run(ReportingDBWriter.java:1279)
                              Caused by java.sql.SQLException: No hay
más datos para leer del socket

oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)

oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179)

oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160)

oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:962)

oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:894)

oracle.jdbc.ttc7.Ocommoncall.receive(Ocommoncall.java:106)

oracle.jdbc.ttc7.TTC7Protocol.logoff(TTC7Protocol.java:396)

oracle.jdbc.driver.OracleConnection.close(OracleConnection.java:1524)

com.sun.grid.reporting.dbwriter.db.Database
$ConnectionProxy.close(Database.java:721)

com.sun.grid.reporting.dbwriter.db.Database.closeAll(Database.java:483)

15/08/2009 02:01:50|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
Error - Trying to reconnect

[...]

15/08/2009 02:10:52|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
Error - Trying to reconnect
15/08/2009 02:11:00|grgen002|BWriter$DerivedValueThread.run|E|Unknown
error: com.sun.grid.reporting.dbwriter.ReportingException: Can not
connect to database jdbc:oracle:thin:@15
0.214.20.61:1711:e: ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
SVR4 Error: 2: No such file or directory


com.sun.grid.reporting.dbwriter.ReportingException: Can not connect to
database jdbc:oracle:thin:@150.214.20.61:1711:e: ORA-01034: ORACLE not
avail
able
ORA-27101: shared memory realm does not exist
SVR4 Error: 2: No such file or directory


com.sun.grid.reporting.dbwriter.db.Database.createSQLError(Database.java:359)

com.sun.grid.reporting.dbwriter.db.Database.getConnection(Database.java:387)

com.sun.grid.reporting.dbwriter.ReportingDBWriter
$DerivedValueThread.run(ReportingDBWriter.java:833)
                              Caused by java.sql.SQLException:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
SVR4 Error: 2: No such file or directory


oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)

oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:289)

oracle.jdbc.ttc7.O3log.receive1st(O3log.java:410)

oracle.jdbc.ttc7.TTC7Protocol.logon(TTC7Protocol.java:260)

oracle.jdbc.driver.OracleConnection.<init>(OracleConnection.java:371)

oracle.jdbc.driver.OracleDriver.getConnectionInstance(OracleDriver.java:551)

oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:351)

java.sql.DriverManager.getConnection(DriverManager.java:582)

java.sql.DriverManager.getConnection(DriverManager.java:154)

com.sun.grid.reporting.dbwriter.db.Database.getConnection(Database.java:381)
15/08/2009 02:11:12|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
Error - Trying to reconnect

[...]

15/08/2009 02:14:33|grgen002|ter.ReportingDBWriter.mainLoop|W|Connection
Error - Trying to reconnect
15/08/2009 02:14:53|grgen002|er.file.FileParser.processFile|I|Renaming
reporting  to reporting.processing
15/08/2009 02:14:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
marked as erroneous, these will be skipped
15/08/2009 02:15:01|grgen002|iter.file.FileParser.parseFile|I|Deleting
file reporting.processing
15/08/2009 02:15:01|grgen002|le.FileParser.createStatistics|I|Processed
3.765 lines in 7,65s (492,03 lines/s)
15/08/2009 02:15:53|grgen002|er.file.FileParser.processFile|I|Renaming
reporting  to reporting.processing
15/08/2009 02:15:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
marked as erroneous, these will be skipped
15/08/2009 02:15:53|grgen002|iter.file.FileParser.parseFile|I|Deleting
file reporting.processing
15/08/2009 02:15:53|grgen002|le.FileParser.createStatistics|I|Processed
372 lines in 0,52s (718,15 lines/s)
15/08/2009 02:16:53|grgen002|er.file.FileParser.processFile|I|Renaming
reporting  to reporting.processing
15/08/2009 02:16:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
marked as erroneous, these will be skipped
15/08/2009 02:16:53|grgen002|iter.file.FileParser.parseFile|I|Deleting
file reporting.processing
15/08/2009 02:16:53|grgen002|le.FileParser.createStatistics|I|Processed
172 lines in 0,35s (491,43 lines/s)
15/08/2009 02:17:53|grgen002|er.file.FileParser.processFile|I|Renaming
reporting  to reporting.processing
15/08/2009 02:17:53|grgen002|iter.file.FileParser.parseFile|W|0 lines
marked as erroneous, these will be skipped
15/08/2009 02:17:55|grgen002|iter.file.FileParser.parseFile|I|Deleting
file reporting.processing

[...]

So Oracle database wasn't available for a while and dbwriter tried to
reconnect. It succeeded and the accounting took place again normally.
However, the module that deletes the outdated values never ran again
(or, at least, it failed to delete the outdated values and write the log
entry). Shouldn't deletion get back as well as the rest of dbwriter
does?

After seeing that, I tried restarting dbwriter. At first it seemed to be
useless, because after some hours no deletion had been performed. But I
checked to database again and now the minimum value was:

2009-08-24 01:00:00.0

I checked the log again and there it was:

30/08/2009 19:08:31|grgen002|rtingDBWriter.logEventDuration|I|deleting
outdated values took 3 hours 19 minutes
30/08/2009 19:09:21|grgen002|rtingDBWriter.logEventDuration|I|
calculating derived values took 0 hours 0 minutes

I restarted dbwriter on Saturday, and the first deletion occurred on
Sunday at 7 pm. This too much time, isn't it?

Now the connection failed again at 2 am as usual and the deletion module
is not being run. So as a workaround I guess we will have to restart the
process every day until the deletion module properly recovers as the
other module of dbwriter do.

Thanks again and I hope there is a solution for this that doesn't need a
restart.

Regards,

Rafa

El vie, 28-08-2009 a las 15:20 +0200, jana escribió:
> Hi,
>
> First, check if this is the case with all the tables. Then check if in
> the dbwriter.log ($SGE_ROOT/$SGE_CELL/spool/dbwriter/dbwriter.log) are
> lines starting: 'deleting outdated values '. Also, check the log for any
> exceptions.
>
> And of course, check in few hours if the min value is really still the same.
>
> Regards,
>
> Jana
>
> rafaarco wrote:
> > I forgot: we're using SGE 6.2, ARCo 6.2u3 (this combination works
> > flawlessly with the exception of the outdated records) and Oracle 9i.
> >
> > Regards,
> >
> > Rafa
> >
> > El vie, 28-08-2009 a las 14:14 +0200, Rafael Arco Arredondo escribió:
> >
> >> Hello,
> >>
> >> We have an issue with outdated records in ARCo database. Our
> >> configuration is the default one: raw host values are kept for 7 days
> >> and dbwriter runs in continuous mode (i.e., deletion should happen once
> >> in an hour). However, the query
> >>
> >> SELECT MIN(hv_time_start) FROM sge_host_values WHERE hv_variable = 'cpu'
> >>
> >> returns
> >>
> >> 2009-08-13 11:39:37.0
> >>
> >> And the table sge_host_values keeps growing.
> >>
> >> Since today is August 28th, I expected this minimum value to be
> >> 2009-08-21. So it's seems that old rows are not deleted.
> >>
> >> Any ideas?
> >>
> >> Thank you in advance,
> >>
> >> Rafa
> >>
> >>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214744
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
--
Rafael Arco Arredondo
Centro de Servicios de Informática y Redes de Comunicaciones
Campus de Fuentenueva - Edificio Mecenas
Universidad de Granada
E-18071 Granada Spain
Tel: +34 958 241010   Ext:31114   E-mail: rafaarco at ugr.es

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=215111

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list