[GE users] Shadow master with Berkely DB spooling on 6.0?

Daniel Templeton Dan.Templeton at Sun.COM
Wed Nov 7 23:18:18 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

In order for a shadow master to work, three things have to be 
accessible.  The first is the heartbeat file in the qmaster's spool 
directory.  It is through watching for regular updates to the heartbeat 
file that the shadow daemon recognizes that the qmaster has died.  The 
second is the act_qmaster file in the $SGE_ROOT/$SGE_CELL/common 
directory.  The act_qmaster file is used to govern who the current 
qmaster is.  If the shadow daemon cannot update the act_qmaster file, it 
can't start a new qmaster.  The third is obviously the qmaster's data 
spool.  If you're using classic spooling, the data spool is located in 
the qmaster's spool directory, so this requirement resolves into the 
first.  If you're using BDB spooling, the data spool may or may not be 
located in the qmaster's spool directory, depending on your configuration.

The NFSv4 requirement comes from BDB.  It has something to do with the 
file locking primitives that are available in NFSv4 but not in NFSv3.  
If you try to run BDB against a data stored on a remote NFSv3 file 
system, it will refuse.  The first two requirements above are easily 
solved by placing the qmaster spool directory in the cell directory and 
NFS-sharing the cell or SGE root directory.  Because of the NFSv4 BDB 
limitation, the BDB spool directory has to then go somewhere other than 
in the root/cell/spool directory, unless you're using NFSv4.  Enter the 
BDB server.  Instead of monkeying with NFSv4 (even though Solaris has a 
great NFSv4 implementation), you can move the data store to a shared BDB 
server.  The problem is that now you have two points of failure: the 
NFS-shared cell directory and the remote BDB server.  If you used 
classic spooling and kept the data store in the cell directory, you'd 
only have one: the NFS-shared cell directory.

Make sense?
Daniel

John Hearns wrote:
> I have a bit of an obscure corner case...
> any insight or replies welcome please.
>
> I am setting up a cluster with:
>
> Two login/master nodes, the intention being to set the second one up as
> a shadow master
>
> The SGE_ROOT is on a separate NFS server, which is running NFSv4
>
>
> The problem I am having at the moment is that the cluster is running
> with a configuration with Berkely DB spooling.
> SGE 6.0 version (yes - I know the)
>
> It wouldn't be too much work for me to re-install with classic spooling
> (we script installs quite heavily, so its no big deal).
>
> However, noticing this email from Rayson:
>
> http://www.bioinformatics.org/pipermail/bioclusters/2004-September/001994.html
>
> Can anyone confirm if a combination of:
>
> SGE 6.0uX , BerkelyDB spooling to disk (not separate server),  SGE_ROOT
> on a separate NFSv4 server will work?
>
>
>
> The documentation seems to say not:
> "1) Local spooling:
> The Berkeley DB spools into a local directory on this host (qmaster host)
> This setup is faster, but you can't setup a shadow master host"
>
>
> It would be nice (Rayson, please?) to have some sort of explanation why NFSv4 
> will 'work' for this, yet a lower version wouldn't.
>  
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list