[GE users] Sgemaster won't start after upgrading

reuti reuti at staff.uni-marburg.de
Mon Feb 22 22:26:42 GMT 2010


Am 22.02.2010 um 23:09 schrieb heywood:

> But of course the old 6.2u3 arch script is now returning lx26 since
> I have
> that symlink defined!
>
> There are significant diffs between the u3 and u5 arch scripts though.

I have one older cluster which is even 6.1u3 (yes, 6.1). The part of
the script is like the one from 6.2u5.

-- Reuti


>
> On 2/22/10 5:04 PM, "heywood" <heywood at cshl.edu> wrote:
>
>> Yes, I used the updated common (6.2u5). I just tested the old
>> 6.2u3 "arch"
>> script and it returns lx26.
>>
>> Here are the relevant diffs between 6.2u3 "arch" and 6.2u5 "arch":
>>
>> <    2.[46].*)
>> <       # retrieve os release. We use 2.4 on kernel 2.6 machines,
>> unless
>> <       # we have binaries installed that have been built for 2.6
>> <       case $osrelease in
>> <       2.4.*)
>> <          lxrelease=24
>> <          ;;
>> <       2.6.*)
>> <          ROOT_DIR=`dirname $0`/..
>> <          if [ "$SGE_ROOT" != "" -a -d "$SGE_ROOT/bin/lx26-$
>> {lxmachine}" ]
>> ; then
>> <             lxrelease=26
>> <          elif [ "$SGE_ROOT" = "" -a -d "$ROOT_DIR/bin/lx26-$
>> {lxmachine}" ]
>> ; then
>> <             lxrelease=26
>> <          else
>> <             lxrelease=24
>> <          fi
>> <          ;;
>> <       esac
>> <
>> <       # verify the GNU C lib version
>> <       # For an alternative means to determine GNU C lib version see
>> <       # http://www.gnu.org/software/libc/FAQ.html#s-4.9
>> ---
>>>    2.2.*)
>>>       lxrelease=22
>>>       ;;
>>>    2.4.*)
>>
>>
>>
>>
>> On 2/22/10 5:04 PM, "reuti" <reuti at staff.uni-marburg.de> wrote:
>>
>>> Am 22.02.2010 um 22:54 schrieb heywood:
>>>
>>>> Looks to me like the util/arch script uses uname to get the lx26
>>>> (kernel is
>>>> 2.6.*), not by looking for the directory lx26* (or lx24*).
>>>>
>>>> If so, the question is why it returned lx24 for 6.2u3 (if it did on
>>>> this 2.6
>>>> system).
>>>
>>> The actual script uses:
>>>
>>>     case $osrelease in
>>>     2.[46].*)
>>>        # retrieve os release. We use 2.4 on kernel 2.6 machines,
>>> unless
>>>        # we have binaries installed that have been built for 2.6
>>>        case $osrelease in
>>>        2.4.*)
>>>           lxrelease=24
>>>           ;;
>>>        2.6.*)
>>>           ROOT_DIR=`dirname $0`/..
>>>           if [ "$SGE_ROOT" != "" -a -d "$SGE_ROOT/bin/lx26-$
>>> {lxmachine}" ] ; then
>>>              lxrelease=26
>>>           elif [ "$SGE_ROOT" = "" -a -d "$ROOT_DIR/bin/lx26-$
>>> {lxmachine}" ] ; then
>>>              lxrelease=26
>>>           else
>>>              lxrelease=24
>>>           fi
>>>           ;;
>>>        esac
>>> ...
>>>
>>> Did you also install the updated common package?
>>>
>>> -- Reuti
>>>
>>>
>>>> But things are working so I'm OK.
>>>>
>>>> Todd
>>>>
>>>>
>>>> On 2/22/10 4:35 PM, "reuti" <reuti at staff.uni-marburg.de> wrote:
>>>>
>>>>> Am 22.02.2010 um 22:18 schrieb heywood:
>>>>>
>>>>>> No, it isn't hard coded. It returns lx26, while the
>>>>>> directories are
>>>>>> named
>>>>>> lx24...
>>>>>
>>>>> The actual version of the arch script checks whether there is a
>>>>> directory lx26-... As you created links to the dirs, it will
>>>>> answer
>>>>> with lx26... But w/o the links, it should fall back to the default
>>>>> lx24...
>>>>>
>>>>> So, the question remains why the actual version of the script
>>>>> answered lx26... although there were no links or dirs in the
>>>>> beginning.
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> [root at bhmnode2 ~]# $SGE_ROOT/util/arch
>>>>>> lx26-amd64
>>>>>> [root at bhmnode2 ~]# uname -a
>>>>>> Linux bhmnode2.cshl.edu 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25
>>>>>> 17:24:31 EDT
>>>>>> 2006 x86_64 x86_64 x86_64 GNU/Linux
>>>>>> [root at bhmnode2 ~]# ls -l $SGE_ROOT/bin
>>>>>> total 96
>>>>>> drwxr-xr-x  2 root root 4096 Feb 22 10:55 lx24-amd64
>>>>>> lrwxrwxrwx  1 root root   10 Feb 22 11:06 lx26-amd64 -> lx24-
>>>>>> amd64
>>>>>> [root at bhmnode2 ~]#
>>>>>>
>>>>>> (I defined that symlink to get things running this morning)
>>>>>>
>>>>>>
>>>>>> On 2/22/10 4:03 PM, "reuti" <reuti at staff.uni-marburg.de> wrote:
>>>>>>
>>>>>>> Am 22.02.2010 um 18:54 schrieb heywood:
>>>>>>>
>>>>>>>> No, we have not compiled SGE, but have used courtesy
>>>>>>>> binaries all
>>>>>>>> along.
>>>>>>>>
>>>>>>>> The /etc/init.d/{sgemaster,sgeexecd} scripts (which are from
>>>>>>>> installing
>>>>>>>> 6.2u3 last summer) are looking for lx26-*. But the utilbin
>>>>>>>> and bin
>>>>>>>> directory
>>>>>>>> names are lx24-*.
>>>>>>>
>>>>>>> You mean it's hardcoded in the script? AFAIK it always used the
>>>>>>> arch
>>>>>>> script in $SGE_ROOT/util/arch by default to determine the
>>>>>>> platform
>>>>>>> its running on. This should also return lx24-amd64 on your
>>>>>>> system.
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>> Todd
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/22/10 12:41 PM, "reuti" <reuti at staff.uni-marburg.de>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Am 22.02.2010 um 17:28 schrieb heywood:
>>>>>>>>>
>>>>>>>>>> Well. For some reason the directory in $SGE_ROOT/utilbin and
>>>>>>>>>> $SGE_ROOT/bin
>>>>>>>>>> was "lx24-amd64", and the script was looking for "lx26-
>>>>>>>>>> amd64". We
>>>>>>>>>> are
>>>>>>>>>> running kernel 2.6 and always have so I don't know where that
>>>>>>>>>> lx24*
>>>>>>>>>> directory name came from.
>>>>>>>>>
>>>>>>>>> the lx24-* is the minimum supported kernel by the provided
>>>>>>>>> binaries
>>>>>>>>> and will also work fine under kernel 2.6. But when you build
>>>>>>>>> SGE on
>>>>>>>>> your own on a 2.6 system, the created directories will be
>>>>>>>>> named
>>>>>>>>> according to the version it found, i.e. you get lx26-*. Did
>>>>>>>>> you
>>>>>>>>> compile it on your own?
>>>>>>>>>
>>>>>>>>> -- Reuti
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Anyways I just created a symlink lx26-amd64 -> lx24-amd64,
>>>>>>>>>> and SGE
>>>>>>>>>> started
>>>>>>>>>> up.
>>>>>>>>>>
>>>>>>>>>> Really weird.
>>>>>>>>>>
>>>>>>>>>> Todd
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2/22/10 10:32 AM, "heywood" <heywood at cshl.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>> No I did not.
>>>>>>>>>>>
>>>>>>>>>>> I followed the patch instructions. I renamed the
>>>>>>>>>>> sge_shepherd
>>>>>>>>>>> with ?mv" and
>>>>>>>>>>> unpacked these tar.gz files:
>>>>>>>>>>>
>>>>>>>>>>>  ge-6.2u5-bin-lx24-amd64.tar.gz
>>>>>>>>>>>  ge-6.2u5-common.tar.gz
>>>>>>>>>>>  hedeby-1.0u5-core.tar.gz
>>>>>>>>>>>
>>>>>>>>>>> Then I tried restarting qmaster
>>>>>>>>>>>
>>>>>>>>>>> Todd
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2/22/10 10:25 AM, "craffi" <dag at sonsorol.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The "can't find path" error is significant. Did you (or the
>>>>>>>>>>>> init
>>>>>>>>>>>> script)
>>>>>>>>>>>> source or run the settings.sh|csh files to set up the SGE
>>>>>>>>>>>> environment
>>>>>>>>>>>> before trying to restart the qmaster?
>>>>>>>>>>>>
>>>>>>>>>>>> -Chris
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> heywood wrote:
>>>>>>>>>>>>> I upgraded from 6.2u3 to 6.2u5, and now sgemaster will not
>>>>>>>>>>>>> start:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root at bhmnode2 sge]# /etc/init.d/sgemaster.bh
>>>>>>>>>>>>> can't determine path to Grid Engine utility binaries
>>>>>>>>>>>>> [root at bhmnode2 sge]#
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>>>>
>>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>>> dsForumId=38&dsMessageId=24>>
>>>>>>>>>> 5
>>>>>>>>>>>> 435
>>>>>>>>>>>>
>>>>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>>>> dsForumId=38&dsMessageId=245
>>>>>>>>>>> 436
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>>> dsForumId=38&dsMessageId=245442
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------
>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>> dsForumId=38&dsMessageId=245
>>>>>>>>> 454
>>>>>>>>>
>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=245455
>>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=245
>>>>>>> 473
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=245476
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=245
>>>>> 481
>>>>>
>>>>> To unsubscribe from this discussion, e-mail:
>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=245485
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>
>>> ------------------------------------------------------
>>>
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=24>>
> 5
>>> 486
>>>
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=245
>> 488
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=245490
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=245492

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list