[GE users] fatal error, run database recovery

Craig Tierney ctierney at hpti.com
Tue Mar 15 01:00:28 GMT 2005

On Mon, 2005-03-14 at 14:26, Beadles, Jeff wrote:
> FYI, I ended up blasting everything, and reinstalling the grid master
> & clients.  I was able to recover part of the database, but there were
> several known things missing (like all.q), and who knows what else
> unknown.
> Regards, -Jeff

Would it be possible to create a database dump and 
rebuild script?  It doesn't seem like it would be that
difficult now that SGE 6 support the list option to list
each entry available (like qconf -sql).

We would just have to make sure that we list out every
configuration that exists.  For those that have list, like
queues, parallel environments, and execution hosts, we would
iterate over each entry.

I would like this for two reasons.  First backing up the database would
be good for times when database corruption occurs.  I have
had problems with SGE 5.3 and NFS mounts when configuration files
get corrupted.  At least you can repair the ascii files by hand.
For a BDB file, it might not be possible.  Second, a backup/restore
feature would be very hand for upgrades from classic spooling
to BDB.  This probably doesn't happen much, but it is the reason
I want to write it in the first place.

Can anyone think of any gotchas to this idea?


> ______________________________________________________________________
> From: Beadles, Jeff [mailto:jeff_beadles at mentorg.com]
> Sent: Mon 3/14/2005 8:28 AM
> To: users at gridengine.sunsource.net
> Subject: [GE users] fatal error, run database recovery
> We had a disk fill on the grid master (SGE 6.0u1) over the weekend,
> and are now seeing the following in the qmaster's messages file when
> trying to startup the grid master:
> 03/13/2005 21:04:49|qmaster|gmaster|E|couldn't open database
> environment for server "local spooling", directory "/grid/spooldb":
> (-30978) DB_RUNRECOVERY: Fatal error, run database recovery
> 03/13/2005 21:04:49|qmaster|gmaster|E|startup of rule "default rule"
> in context "berkeleydb spooling" failed
> 03/13/2005 21:04:49|qmaster|gmaster|C|setup failed
> I've not seen a database recovery program, does such a program exist?
> Any ideas, short of reinstalling everything on how to correct this?
> FYI, this is version 6.0u1, with bdb spooling on the local (grid
> master) host on a local disk.
> Thanks in advance,
>   -Jeff

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list