[GE users] Fixing a broken Berkeley database?

orlandorichards orlando.richards at ed.ac.uk
Wed Nov 12 11:46:21 GMT 2008


Hi Skip,

skip at pobox.com wrote:
>     Orlando> I'm happy to report that we have repaired the database, by
>     Orlando> using "db_dump" followed by "db_load".
> 
> Did db_dump complain about anything?  

Unfortunately, my live notes got lost in a web browser crash (more fool 
me for editing in a wiki!) - but I don't think that it did. Thankfully, 
I still have a copy of the corrupted spooldb/ folder, so pasted below is 
the process repeated in full for posterity!


> Any chance you updated BerkDB during
> the interval?  The Berkeley DB people have a nasty habit of changing the
> file format across minor version upgrades.  You might have run some sequence
> of commands like this (over a fairly long timeframe):
> 
>     run SGE, everything's copacetic
>     time passes
>     someone updates BerkDB
>     time passes, people forget the possible connection between SGE/BerkDB
>     stop/start SGE, all hell breaks loose
>     think about the problem
>     stop SGE
>     run db_dump/db_load
>     start SGE, everything's copacetic
> 
> Perhaps that's what you encountered.


I'm pretty sure the corruption stemmed from a crash in the SGE qmaster 
that was (probably) brought about by a very slow file system. AFAIK, SGE 
will use the BDB software that's bundled up with it ($SGE_ROOT/utilbin/l 
x24-amd64/ ) - and that's certainly the binaries that I used to do the 
recovery.

--
Orlando.


-------------------------
[root at eddie01 spool]# cp -a spooldb.bak/ spooldb.test
[root at eddie01 spool]# cd spooldb.test
[root at eddie01 spooldb.test]# ls
__db.001  __db.003  __db.005  log.0000011211  sge_job
__db.002  __db.004  __db.006  sge
[root at eddie01 spooldb.test]# 
/exports/applications/sge/utilbin/lx24-amd64/db_verify sge
db_verify: Page 65: item 17 of unrecognizable type
db_verify: Page 65: item 18 of unrecognizable type
db_verify: Page 65: item 19 of unrecognizable type
db_verify: Page 65: item 20 of unrecognizable type
db_verify: Page 65: item 21 of unrecognizable type
db_verify: Page 65: item 22 of unrecognizable type
db_verify: Page 65: item 23 of unrecognizable type
db_verify: Page 65: item 24 of unrecognizable type
db_verify: Page 65: item 25 of unrecognizable type
db_verify: Page 65: item 26 of unrecognizable type
db_verify: Page 65: item 27 of unrecognizable type
db_verify: Page 65: item 28 of unrecognizable type
db_verify: Page 65: item 29 of unrecognizable type
db_verify: Page 65: item 30 of unrecognizable type
db_verify: Page 65: item 31 of unrecognizable type
db_verify: Page 65: item 32 of unrecognizable type
db_verify: Page 65: item 33 of unrecognizable type
db_verify: Page 65: item 34 of unrecognizable type
db_verify: Page 65: item 35 of unrecognizable type
db_verify: Page 65: item 36 of unrecognizable type
db_verify: Page 65: gap between items at offset 14216
db_verify: Page 65: item order check unsafe: skipping
db_verify: sge: DB_VERIFY_BAD: Database verification failed
[root at eddie01 spooldb.test]# 
/exports/applications/sge/utilbin/lx24-amd64/db_recover
[root at eddie01 spooldb.test]# 
/exports/applications/sge/utilbin/lx24-amd64/db_dump -f sge.out sge
[root at eddie01 spooldb.test]# mv sge sge.old
[root at eddie01 spooldb.test]# 
/exports/applications/sge/utilbin/lx24-amd64/db_load -f sge.out sge
[root at eddie01 spooldb.test]# 
/exports/applications/sge/utilbin/lx24-amd64/db_verify sge
[root at eddie01 spooldb.test]#



-- 
             --
    Dr Orlando Richards
   Information Services
IT Infrastructure Division
        Unix Section
     Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88524

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list