[GE issues] [Issue 2810] New - test in init script sgeexecd is inadquate

mathog mathog at caltech.edu
Wed Nov 26 17:04:40 GMT 2008


http://gridengine.sunsource.net/issues/show_bug.cgi?id=2810
                 Issue #|2810
                 Summary|test in init script sgeexecd is inadquate
               Component|gridengine
                 Version|6.0u10
                Platform|PC
                     URL|
              OS/Version|Linux
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|execution
             Assigned to|pollinger
             Reported by|mathog






------- Additional comments from mathog at sunsource.net Wed Nov 26 09:04:39 -0800 2008 -------
The compute nodes on my cluster NFS mount the SGE distribution on /usr/SGE6.
So SGE_ROOT is /usr/SGE6.  If during boot this NFS mount has not completed by
the time sgeexecd reaches this section of code:

while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do
    count=`expr $count + 1`
    sleep 1
done

an error will occur.  Since /usr/SGE6 is a directory, it has to be to NFS mount
on it, the test will pass and the script will go on, to fail later.  This
problem showed up after an upgrade from Mandriva 2007.1 to 2008.1, which
apparently changed the boot sequence timing somehow.  It took a while to find
this since, as soon as I could log in, NFS had always mounted, so that running
sgeexecd manually always worked.

My fix was to change the test from "$SGE_ROOT" to "$SGE_ROOT/bin".  Since before
the NFS mount is completed $SGE_ROOT is an empty directory, the test will fail
before the NFS mount, and will pass after it.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=90001

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list