[GE users] Replacing SGE master server.

fernandosilva fernando at phenogenomics.ca
Mon Nov 22 18:04:05 GMT 2010


I would like to start off by stating I know very little about the inner workings of SGE and this backup restoration was pretty much dropped on my lap.

Situation:  Needed to replace the original SGE master to a new host.
The original host was an Ubuntu 8.04LTS server.
I simply built a new Ubuntu 10.04LTS server and transfered/mimicked everything I could see.
New server received..
- Same host name and IP
- rsynced /sge from old to new.
- reset(duplicated) all permissions on new /sge
- replicated NFS export of /sge for all connected hosts.
- transfered init.d/ startup scripts.
- restarted all sge cluster nodes, to reconnect to new sge_master nfs export "/sge"

Everything seemed fine and users are able to run some jobs.  Other users are getting the below error attempting specific type of jobs.  I must be missing something.  I would greatly appreciate any help!!

----- Error --------
[2010-11-22 11:53:22] Changing status of atlas_blur_0.2_dxyz in pipe
lsq12-img_26oct10.6-pairs to running
Unable to run job: error writing object "3216708" to spooling database
cannot close transaction: There is no open transaction
transaction function of rule "default rule" in context "berkeleydb
spooling" failed
job 3216708 was rejected cause it couldn't be written.
Exiting.
ERROR: could not close qsub pipe lsq12-img_26oct10.6-pairs:
-----------------------




More information about the gridengine-users mailing list