[GE users] Replacing SGE master server.
fernando at phenogenomics.ca
Mon Nov 22 18:04:05 GMT 2010
I would like to start off by stating I know very little about the inner workings of SGE and this backup restoration was pretty much dropped on my lap.
Situation: Needed to replace the original SGE master to a new host.
The original host was an Ubuntu 8.04LTS server.
I simply built a new Ubuntu 10.04LTS server and transfered/mimicked everything I could see.
New server received..
- Same host name and IP
- rsynced /sge from old to new.
- reset(duplicated) all permissions on new /sge
- replicated NFS export of /sge for all connected hosts.
- transfered init.d/ startup scripts.
- restarted all sge cluster nodes, to reconnect to new sge_master nfs export "/sge"
Everything seemed fine and users are able to run some jobs. Other users are getting the below error attempting specific type of jobs. I must be missing something. I would greatly appreciate any help!!
----- Error --------
[2010-11-22 11:53:22] Changing status of atlas_blur_0.2_dxyz in pipe
lsq12-img_26oct10.6-pairs to running
Unable to run job: error writing object "3216708" to spooling database
cannot close transaction: There is no open transaction
transaction function of rule "default rule" in context "berkeleydb
job 3216708 was rejected cause it couldn't be written.
ERROR: could not close qsub pipe lsq12-img_26oct10.6-pairs:
More information about the gridengine-users