[GE users] Upgrade howto?

Paul MacInnis macinnis at dal.ca
Mon Apr 14 17:55:56 BST 2008


On Mon, 14 Apr 2008, David Olbersen wrote:

> Paul,
> 
> That's awesome, thank you so much for sharing.
> This might be the approach we end up taking.
> You say you only changed the SGE_ROOT and the port's, did you need to
> change the cell name, or was that handled by having a different root?
> 
> -- 
> David Olbersen
>  

David,

It's all handled by having the separate SGE_ROOT directories. We
use the one cell only, "default".

Paul

> 
> -----Original Message-----
> From: Paul MacInnis [mailto:macinnis at dal.ca] 
> Sent: Monday, April 14, 2008 4:05 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Upgrade howto?
> 
> On Fri, 11 Apr 2008, David Olbersen wrote:
> 
> > Hi,
> >  
> > I read the upgrade HOWTO (from 6.0 -> 6.1) and it makes it sound like 
> > you have to shut down the entire cluster to upgrade.
> > I don't know if my users will tolerate that. Are there alternatives 
> > that others have used? Is there a phased approach, or something else I
> 
> > can do where I don't have to wait for all the jobs to finish?
> >  
> > ________________________________
> > 
> > David Olbersen
> >  
> 
> Hi David,
> 
> You might consider installing (not upgrading) 6.1 to run in parallel
> with 6.0 for a time.
> 
> This is our experience, for what it's worth ...
> 
> Last year we moved from 5.3 to 6.1.
> 
> Rather than upgrade, which wasn't possible, we did a complete new
> install of 6.1 onto the master node.  To keep the 2 versions separate we
> used a different SGE_ROOT location and different SGE_QMASTER_PORT and
> SGE_EXECD_PORT port numbers.
> 
> We cut several nodes from 5.3 and installed 6.1 on them for testing.
> We ran 6.1 and 5.3 like this in parallel for several weeks until we had
> 6.1 queues, etc setup the way we wanted.  Note, we use classic spooling
> so everything related to each version was stored under the appropriate
> SGE_ROOT.
> 
> When everything was ready we installed 6.1 on all remaining slave nodes
> and set a date for the switchover.  On that date we changed the
> system-wide login script to source the 6.1 sge_settings.sh rather than
> the 5.3 one. We also made the 5.3 qsub non-executable.  After that new
> jobs went to 6.1 and the running jobs on 5.3 eventually finished.  Note,
> our queues all have load_avg as a load_threshold which prevented each
> scheduler from sending jobs to nodes running the other scheduler's jobs.
> 
> During the switch over we discovered that one user's jobs wouldn't work
> on
> 6.1 so we changed his login to source the 5.3 sge_settings.sh and
> allowed him to use 5.3's qsub until we could trace the problem.  There
> was a slight danger here that an idle node could be hit by both
> schedulers and become overloaded but it never happened.
> 
> This parallel operation worked well because the SGE developers seem to
> have taken care that all the pieces - qstat, qhost, qmod, etc - follow
> the caller's environment settings for SGE_ROOT, SGE_QMASTER_PORT and
> SGE_EXECD_PORT.
> 
> I hope this helps,
> 
> Paul
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list