[GE users] Installing and Configuring SGE (as execute-host) on Diskless-Cluster Nodes
jbhoren at alaska.edu
Tue Apr 6 01:03:53 BST 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Installing SGE on a Diskless Cluster
The cluster's head-node is used as the "Golden Node", and the operating system (RHEL/5.5) was copied to /pxe-diskless/root using this command:
rsync -a --exclude='/proc' --exclude='/sys' / /pxe-diskless/root/
system-config-netboot was used to create the tftp-boot images, and to add the cluster nodes into /pxe-diskless/snapshot.
Install the RPMs on the head-node (sun-sge-common-6.2-5 and sun-sge-bin-linux24-x64-6.2-5), then chroot /pxe-diskless/root and install them there, as well.
On the head-node, cd /gridengine/sge and run install_sge -m -x
Copy the entire default directory from /gridengine/sge into /pxe-diskless/root/gridengine/sge
Copy sgeexecd from /etc/init.d into /pxe-diskless/root/etc/init.d
On the head-node, cd /gridengine/sge and run install_sge -ux
Edit /pxe-diskless/snapshot/files.custom and add the line: /gridware/sge/default/spool/
This will cause each node's /gridware/sge/default/spool to overwrite the system SGE spool directory, mounted from the Golden Node -- we want to do so, because everything from /pxe-diskless/root is mounted read-only, while everything from /pxe-diskless/snapshot is mounted read-write; this is vital, because each node's SGE execd needs to write to its local SGE spool directory.
Make sure that the SGE qmaster daemon is running on the cluster's head-node (remember to use /sbin/chkconfig to ensure that it will start automatically on system boot!).
On the head-node, create /gridengine/sge/default/common/sge_qstat, with the line -u * as its only contents.
Log-in to each cluster compute node and cd /gridengine/sge. Run ./install_execd
Accept the default values, and do not attempt to write or replace the SGE execd startup scripts (after all, /etc/init.d is on a filesystem which is mounted read-only). Verify that the SGE execd is running.
When you have finished, chroot /pxe-diskless/root and run /sbin/chkconfig --add sgeexecd and /sbin/chkconfig --level 345 sgeexecd on
Reboot all of the cluster's compute nodes. On the head-node, run qstat -f and qmon to verify that all execute nodes and their queues are "present and accounted-for".
JONATHAN B. HOREN
UAF Life Science Informatics
Center for Research Services
jbhoren at alaska.edu<mailto:jbhoren at alaska.edu>
More information about the gridengine-users