[GE issues] [Issue 2810] New - test in init script sgeexecd is inadquate
mathog at caltech.edu
Wed Nov 26 17:04:40 GMT 2008
Summary|test in init script sgeexecd is inadquate
------- Additional comments from mathog at sunsource.net Wed Nov 26 09:04:39 -0800 2008 -------
The compute nodes on my cluster NFS mount the SGE distribution on /usr/SGE6.
So SGE_ROOT is /usr/SGE6. If during boot this NFS mount has not completed by
the time sgeexecd reaches this section of code:
while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do
count=`expr $count + 1`
an error will occur. Since /usr/SGE6 is a directory, it has to be to NFS mount
on it, the test will pass and the script will go on, to fail later. This
problem showed up after an upgrade from Mandriva 2007.1 to 2008.1, which
apparently changed the boot sequence timing somehow. It took a while to find
this since, as soon as I could log in, NFS had always mounted, so that running
sgeexecd manually always worked.
My fix was to change the test from "$SGE_ROOT" to "$SGE_ROOT/bin". Since before
the NFS mount is completed $SGE_ROOT is an empty directory, the test will fail
before the NFS mount, and will pass after it.
To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users