[GE users] fatal error, run database recovery

Joachim Gabler Joachim.Gabler at Sun.COM
Tue Mar 15 08:30:23 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Craig Tierney schrieb:

>On Mon, 2005-03-14 at 14:26, Beadles, Jeff wrote:
>  
>
>>FYI, I ended up blasting everything, and reinstalling the grid master
>>& clients.  I was able to recover part of the database, but there were
>>several known things missing (like all.q), and who knows what else
>>unknown.
>> 
>>Regards, -Jeff
>>    
>>
>
>Would it be possible to create a database dump and 
>rebuild script?  It doesn't seem like it would be that
>difficult now that SGE 6 support the list option to list
>each entry available (like qconf -sql).
>  
>
A backup and restore of a cluster can be done with the inst_sge script:
inst_sge -bup will backup all configuration data (AFAIK not the jobs),
inst_sge -rst is used for restoring the data.

   Joachim

>We would just have to make sure that we list out every
>configuration that exists.  For those that have list, like
>queues, parallel environments, and execution hosts, we would
>iterate over each entry.
>
>I would like this for two reasons.  First backing up the database would
>be good for times when database corruption occurs.  I have
>had problems with SGE 5.3 and NFS mounts when configuration files
>get corrupted.  At least you can repair the ascii files by hand.
>For a BDB file, it might not be possible.  Second, a backup/restore
>feature would be very hand for upgrades from classic spooling
>to BDB.  This probably doesn't happen much, but it is the reason
>I want to write it in the first place.
>
>Can anyone think of any gotchas to this idea?
>
>Craig
>
>
>
>
>  
>
>>______________________________________________________________________
>>From: Beadles, Jeff [mailto:jeff_beadles at mentorg.com]
>>Sent: Mon 3/14/2005 8:28 AM
>>To: users at gridengine.sunsource.net
>>Subject: [GE users] fatal error, run database recovery
>>
>>
>>We had a disk fill on the grid master (SGE 6.0u1) over the weekend,
>>and are now seeing the following in the qmaster's messages file when
>>trying to startup the grid master:
>> 
>>03/13/2005 21:04:49|qmaster|gmaster|E|couldn't open database
>>environment for server "local spooling", directory "/grid/spooldb":
>>(-30978) DB_RUNRECOVERY: Fatal error, run database recovery
>>03/13/2005 21:04:49|qmaster|gmaster|E|startup of rule "default rule"
>>in context "berkeleydb spooling" failed
>>03/13/2005 21:04:49|qmaster|gmaster|C|setup failed
>>
>>I've not seen a database recovery program, does such a program exist?
>> 
>>Any ideas, short of reinstalling everything on how to correct this?
>> 
>>FYI, this is version 6.0u1, with bdb spooling on the local (grid
>>master) host on a local disk.
>> 
>>Thanks in advance,
>>  -Jeff
>> 
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list