[GE users] qmaster migration

templedf dan.templeton at sun.com
Thu Mar 11 13:54:52 GMT 2010


The inst_sge script has options for backup and restore (-bup & -rst) 
that you could use.

Daniel

On 03/11/10 03:49, pvdmeer wrote:
> Dear gridengine users,
>
> Recently I attempted to migrate the master to another machine in our
> cluster. What I did was basically this:
> 1) shut down the daemons
> 2) installed the "gridengine-master" package on the new machine, and
> configured it..
> 3) copied the "/var/lib/gridengine/default/common" directory from the
> old to the new machine, changed the NFS mount of this directory, set the
> "act_qmaster"" file to the new hostname.
> 4) adjusted "/var/spool/gridengine/qmaster/jobseqnum" to reflect the
> current job number
> 5) started up all daemons
>
> I think I missed something here. For one, it's pretty cumbersome. And,
> more importantly, users noticed that using "qdel" on their jobs
> basically could freeze a machine. Raising load average to about 20.0
> (!). As suspected. There was something wrong with the job numbers .. It
> seemed to start from 0 again.. Oh no!
>
> So, what I did was one additional step. This time, I made sure I copied
> the "/var/spool/gridengine/spooldb".... So far, it seems to work. I can
> submit and delete jobs without a problem. But it seems like quite a chore..
>
> My questions:
>
> Is there an official and clean 'n easy way to migrate? I'm running
> Debian on 64 bit Intel, and SGE 6.2u3 (IIRC).
>
> Any hints are appreciated.
>
> With kind regards,
>
> Pieter van der Meer
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248009

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list