Opened 12 years ago
Last modified 10 years ago
#603 new defect
IZ2810: test in init script sgeexecd is inadquate
Reported by: | mathog | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.0u10 |
Severity: | Keywords: | PC Linux execution | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2810]
Issue #: 2810 Platform: PC Reporter: mathog (mathog) Component: gridengine OS: Linux Subcomponent: execution Version: 6.0u10 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: pollinger (pollinger) QA Contact: pollinger URL: * Summary: test in init script sgeexecd is inadquate Status whiteboard: Attachments: Issue 2810 blocks: Votes for issue 2810: Opened: Wed Nov 26 10:04:00 -0700 2008 ------------------------ The compute nodes on my cluster NFS mount the SGE distribution on /usr/SGE6. So SGE_ROOT is /usr/SGE6. If during boot this NFS mount has not completed by the time sgeexecd reaches this section of code: while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do count=`expr $count + 1` sleep 1 done an error will occur. Since /usr/SGE6 is a directory, it has to be to NFS mount on it, the test will pass and the script will go on, to fail later. This problem showed up after an upgrade from Mandriva 2007.1 to 2008.1, which apparently changed the boot sequence timing somehow. It took a while to find this since, as soon as I could log in, NFS had always mounted, so that running sgeexecd manually always worked. My fix was to change the test from "$SGE_ROOT" to "$SGE_ROOT/bin". Since before the NFS mount is completed $SGE_ROOT is an empty directory, the test will fail before the NFS mount, and will pass after it. ------- Additional comments from mathog Tue Dec 2 14:40:22 -0700 2008 ------- Note, my "fix" only works in the case where the NFS mount is a little delayed. If it never comes through at all the loop will execute 120 times, never satisfy the test, and then try to start up SGE anyway, even though there is no hope it will succeed.
Note: See
TracTickets for help on using
tickets.