Opened 11 years ago

Last modified 9 years ago

#591 new defect

IZ2776: restore fails when spooling with bdb rpc server and qmaster has a local spool directory

Reported by: joga Owned by:
Priority: lowest Milestone:
Component: sge Version: 6.0
Severity: Keywords: install
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2776]

        Issue #:      2776             Platform:     All      Reporter: joga (joga)
       Component:     gridengine          OS:        All
     Subcomponent:    install          Version:      6.0         CC:    None defined
        Status:       NEW              Priority:     P5
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    dom (dom)
      QA Contact:     dom
          URL:
       * Summary:     restore fails when spooling with bdb rpc server and qmaster has a local spool directory
   Status whiteboard:
      Attachments:

     Issue 2776 blocks:
   Votes for issue 2776:


   Opened: Tue Nov 4 04:11:00 -0700 2008 
------------------------


In a cluster with spooling via BDB RPC server,
and qmaster has its spooldirectory on a local filesystem,
BDB server host != qmaster host.

Backup with inst_sge -bup is started on the bdb server host, doesn't report an
error, but is incomplete (does not contain the qmaster spooling data).
Restore with inst_sge -rst (on the bdb server host) fails, when trying to create
the qmaster spool directory, or should it succeed, qmaster (on a different host)
will not see the restored directory.

Making this a P5, as the scenario is not really realistic:
The reason for using the BDB RPC server is to have failover of sge_qmaster via
sge_shadowd.
But failover with sge_shadowd doesn't work with local qmaster spooldirectory.

Evaluation:
Backup / restore must be started on the bdb server host (as it usually has a
local directory for the spooling data),
but the bdb server host cannot access the local qmaster spooldirectory on
qmaster host.

Suggested Fix:
Make the backup / restore a two step process, as this is already done in the
qmaster installation:

backup / restore is started on the qmaster host,
it can backup / restore the qmaster data,
when it comes to backing up / restoring the bdb data,
the user is asked to start a inst_sge -bup/-rst -db on the bdb server host,
backup directory must be on NFS (the same as is used for the qmaster backup).

In case of the backup, the backup process on qmaster host can verify that the
bdb data is available in the backup directory, once the user acknowledges having
done the bdb backup.

Work Around:
Do not use such a setup - it does not really make sense.

Change History (0)

Note: See TracTickets for help on using tickets.