[GE users] DB failure

Ari P Seitsonen ari.p.seitsonen at iki.fi
Thu May 11 16:14:44 BST 2006


Thanks for the answers; of course also I chose the Berkeley DB because 
during the quick installation I understood exactly this "outdated" or
"not-a-best-practice-anymore".

   Anyway, I was able to rescue the situation with

# > db_recover -t 05091800

i.e. recovering using a past time point just before the crash.

     Thanks again!,

        apsi

-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-
   Ari P Seitsonen / Ari.P.Seitsonen at iki.fi / http://www.iki.fi/~apsi/
   GSM: +33-6-6736 3820

On Thu, 11 May 2006, Chris Dagdigian wrote:

>
> I know that the 'way of the future' for SGE is berkeley db based spooling, 
> especially when we get all the replication stuff from the Berkeley product, 
> but for many of the clusters I've worked on, a performance gain is simply not 
> worth the inconvenience of having all the spool data in a non-human-readable 
> binary storage format.  On more than one occasion we've also lost entire SGE 
> configurations when (for instance) buggy Apple XSAN software decides to take 
> a random unscheduled coffee break. Recovering corrupt berkeley files is not 
> fun -- for SVN repositories or SGE configurations.
>
> Our rule of thumb now is classic spooling for all clusters less than 32 nodes 
> in size, even for people with high job throughput volumes. We tell the people 
> with high job volumes to get used to Grid Engine for a while and when they 
> are ready to start another round of optimization and performance tuning 
> efforts they should simply tack on the possibility of a switch to berkeley 
> spooling as one of the potential options. Scheduler tuning, filesystem 
> performance and end-user workflows seem to have far more impact on 
> performance and throughput than the underlying spooling technology.
>
> This opinion, of course, is colored by experience with lots of small systems 
> (2 to 10 nodes on average it seems) rather than a few massive installations 
> so other peoples experiences could and probably does differ from mine.
>
> I was thinking though that it may be a good idea to write up a RFE for the 
> installation scripts -- maybe a bit more text in the spooling choice screen 
> that tells people they may want to choose classic mode if their system is 
> under N nodes in size.  The way the docs and installation scripts look now, a 
> new user will probably always choose berkeleydb simply because classic is 
> presented as in a way that makes it look (to a new user) either "outdated" or 
> "not-a-best-practice-anymore"
>
> -Chris
>
>
>
>
>
>
> On May 11, 2006, at 9:26 AM, Rayson Ho wrote:
>
>> If no one has a better solution, then you can at least start by
>> reading the install script and see how the DB got initialized...
>> 
>> BTW, if your cluster is small, or the volume of job is low, you should
>> use "classic spooling" - way lot easier to maintain than BDB...
>> 
>> Rayson
>> 
>> 
>> 
>> 
>> 
>> On 5/11/06, Ari P Seitsonen <ari.p.seitsonen at iki.fi> wrote:
>>> 
>>> Dear experts on SGE,
>>>
>>>  The main disc of our small Opteron cluster ran full the other day, and
>>> thus SGE (v6.0u7_1, compiled from the source code) crashed. Now I'm trying
>>> to restart it again, but all I get is
>>> 
>>> # 05/10/2006 13:29:00|qmaster|curienite|E|couldn't open database 
>>> environment for server "local spooling", directory 
>>> "/opt/software/sge/v6.0u7_1-target/default/spool/spooldb": (-30974) 
>>> DB_RUNRECOVERY: Fatal error, run database recovery
>>> # 05/10/2006 13:29:00|qmaster|curienite|E|startup of rule "default rule" 
>>> in context "berkeleydb spooling" failed
>>> # 05/10/2006 13:29:00|qmaster|curienite|C|setup failed
>>> 
>>> when I try to run
>>> 
>>> './default/common/sgemaster start'
>>>
>>>  It doesn't help even if I do 'db_recover' or 'db_recover -c' before
>>> that. Does any one have an idea what to do? At least to create new
>>> database, the users are impatiently waiting...
>>>
>>>    Thanks and greetings,
>>>
>>>       apsi
>>> 
>>> -=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-=*=-
>>>  Ari P Seitsonen / Ari.P.Seitsonen at iki.fi / http://www.iki.fi/~apsi/
>>>  GSM: +33-6-6736 3820
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list