[GE users] Sgemaster won't start after upgrading

heywood heywood at cshl.edu
Mon Feb 22 22:04:59 GMT 2010


Yes, I used the updated common (6.2u5). I just tested the old 6.2u3 "arch"
script and it returns lx26.

Here are the relevant diffs between 6.2u3 "arch" and 6.2u5 "arch":

<    2.[46].*)
<       # retrieve os release. We use 2.4 on kernel 2.6 machines, unless
<       # we have binaries installed that have been built for 2.6
<       case $osrelease in
<       2.4.*) 
<          lxrelease=24
<          ;;
<       2.6.*) 
<          ROOT_DIR=`dirname $0`/..
<          if [ "$SGE_ROOT" != "" -a -d "$SGE_ROOT/bin/lx26-${lxmachine}" ]
; then
<             lxrelease=26
<          elif [ "$SGE_ROOT" = "" -a -d "$ROOT_DIR/bin/lx26-${lxmachine}" ]
; then
<             lxrelease=26
<          else
<             lxrelease=24
<          fi
<          ;;
<       esac
< 
<       # verify the GNU C lib version
<       # For an alternative means to determine GNU C lib version see
<       # http://www.gnu.org/software/libc/FAQ.html#s-4.9
---
>    2.2.*)
>       lxrelease=22
>       ;;
>    2.4.*)




On 2/22/10 5:04 PM, "reuti" <reuti at staff.uni-marburg.de> wrote:

> Am 22.02.2010 um 22:54 schrieb heywood:
> 
>> Looks to me like the util/arch script uses uname to get the lx26
>> (kernel is
>> 2.6.*), not by looking for the directory lx26* (or lx24*).
>> 
>> If so, the question is why it returned lx24 for 6.2u3 (if it did on
>> this 2.6
>> system).
> 
> The actual script uses:
> 
>     case $osrelease in
>     2.[46].*)
>        # retrieve os release. We use 2.4 on kernel 2.6 machines, unless
>        # we have binaries installed that have been built for 2.6
>        case $osrelease in
>        2.4.*)
>           lxrelease=24
>           ;;
>        2.6.*)
>           ROOT_DIR=`dirname $0`/..
>           if [ "$SGE_ROOT" != "" -a -d "$SGE_ROOT/bin/lx26-$
> {lxmachine}" ] ; then
>              lxrelease=26
>           elif [ "$SGE_ROOT" = "" -a -d "$ROOT_DIR/bin/lx26-$
> {lxmachine}" ] ; then
>              lxrelease=26
>           else
>              lxrelease=24
>           fi
>           ;;
>        esac
> ...
> 
> Did you also install the updated common package?
> 
> -- Reuti
> 
> 
>> But things are working so I'm OK.
>> 
>> Todd
>> 
>> 
>> On 2/22/10 4:35 PM, "reuti" <reuti at staff.uni-marburg.de> wrote:
>> 
>>> Am 22.02.2010 um 22:18 schrieb heywood:
>>> 
>>>> No, it isn't hard coded. It returns lx26, while the directories are
>>>> named
>>>> lx24...
>>> 
>>> The actual version of the arch script checks whether there is a
>>> directory lx26-... As you created links to the dirs, it will answer
>>> with lx26... But w/o the links, it should fall back to the default
>>> lx24...
>>> 
>>> So, the question remains why the actual version of the script
>>> answered lx26... although there were no links or dirs in the
>>> beginning.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> [root at bhmnode2 ~]# $SGE_ROOT/util/arch
>>>> lx26-amd64
>>>> [root at bhmnode2 ~]# uname -a
>>>> Linux bhmnode2.cshl.edu 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25
>>>> 17:24:31 EDT
>>>> 2006 x86_64 x86_64 x86_64 GNU/Linux
>>>> [root at bhmnode2 ~]# ls -l $SGE_ROOT/bin
>>>> total 96
>>>> drwxr-xr-x  2 root root 4096 Feb 22 10:55 lx24-amd64
>>>> lrwxrwxrwx  1 root root   10 Feb 22 11:06 lx26-amd64 -> lx24-amd64
>>>> [root at bhmnode2 ~]#
>>>> 
>>>> (I defined that symlink to get things running this morning)
>>>> 
>>>> 
>>>> On 2/22/10 4:03 PM, "reuti" <reuti at staff.uni-marburg.de> wrote:
>>>> 
>>>>> Am 22.02.2010 um 18:54 schrieb heywood:
>>>>> 
>>>>>> No, we have not compiled SGE, but have used courtesy binaries all
>>>>>> along.
>>>>>> 
>>>>>> The /etc/init.d/{sgemaster,sgeexecd} scripts (which are from
>>>>>> installing
>>>>>> 6.2u3 last summer) are looking for lx26-*. But the utilbin and bin
>>>>>> directory
>>>>>> names are lx24-*.
>>>>> 
>>>>> You mean it's hardcoded in the script? AFAIK it always used the
>>>>> arch
>>>>> script in $SGE_ROOT/util/arch by default to determine the platform
>>>>> its running on. This should also return lx24-amd64 on your system.
>>>>> 
>>>>> -- Reuti
>>>>> 
>>>>> 
>>>>>> Todd
>>>>>> 
>>>>>> 
>>>>>> On 2/22/10 12:41 PM, "reuti" <reuti at staff.uni-marburg.de> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Am 22.02.2010 um 17:28 schrieb heywood:
>>>>>>> 
>>>>>>>> Well. For some reason the directory in $SGE_ROOT/utilbin and
>>>>>>>> $SGE_ROOT/bin
>>>>>>>> was "lx24-amd64", and the script was looking for "lx26-
>>>>>>>> amd64". We
>>>>>>>> are
>>>>>>>> running kernel 2.6 and always have so I don't know where that
>>>>>>>> lx24*
>>>>>>>> directory name came from.
>>>>>>> 
>>>>>>> the lx24-* is the minimum supported kernel by the provided
>>>>>>> binaries
>>>>>>> and will also work fine under kernel 2.6. But when you build
>>>>>>> SGE on
>>>>>>> your own on a 2.6 system, the created directories will be named
>>>>>>> according to the version it found, i.e. you get lx26-*. Did you
>>>>>>> compile it on your own?
>>>>>>> 
>>>>>>> -- Reuti
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Anyways I just created a symlink lx26-amd64 -> lx24-amd64,
>>>>>>>> and SGE
>>>>>>>> started
>>>>>>>> up.
>>>>>>>> 
>>>>>>>> Really weird.
>>>>>>>> 
>>>>>>>> Todd
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 2/22/10 10:32 AM, "heywood" <heywood at cshl.edu> wrote:
>>>>>>>> 
>>>>>>>>> No I did not.
>>>>>>>>> 
>>>>>>>>> I followed the patch instructions. I renamed the sge_shepherd
>>>>>>>>> with ?mv" and
>>>>>>>>> unpacked these tar.gz files:
>>>>>>>>> 
>>>>>>>>>  ge-6.2u5-bin-lx24-amd64.tar.gz
>>>>>>>>>  ge-6.2u5-common.tar.gz
>>>>>>>>>  hedeby-1.0u5-core.tar.gz
>>>>>>>>> 
>>>>>>>>> Then I tried restarting qmaster
>>>>>>>>> 
>>>>>>>>> Todd
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 2/22/10 10:25 AM, "craffi" <dag at sonsorol.org> wrote:
>>>>>>>>> 
>>>>>>>>>> The "can't find path" error is significant. Did you (or the
>>>>>>>>>> init
>>>>>>>>>> script)
>>>>>>>>>> source or run the settings.sh|csh files to set up the SGE
>>>>>>>>>> environment
>>>>>>>>>> before trying to restart the qmaster?
>>>>>>>>>> 
>>>>>>>>>> -Chris
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> heywood wrote:
>>>>>>>>>>> I upgraded from 6.2u3 to 6.2u5, and now sgemaster will not
>>>>>>>>>>> start:
>>>>>>>>>>> 
>>>>>>>>>>> [root at bhmnode2 sge]# /etc/init.d/sgemaster.bh
>>>>>>>>>>> can't determine path to Grid Engine utility binaries
>>>>>>>>>>> [root at bhmnode2 sge]#
>>>>>>>>>> 
>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>> 
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=24>>
>>>>>>>> 5
>>>>>>>>>> 435
>>>>>>>>>> 
>>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>> 
>>>>>>>>> ------------------------------------------------------
>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>> dsForumId=38&dsMessageId=245
>>>>>>>>> 436
>>>>>>>>> 
>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>> 
>>>>>>>> ------------------------------------------------------
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=245442
>>>>>>>> 
>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>> 
>>>>>>> ------------------------------------------------------
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=245
>>>>>>> 454
>>>>>>> 
>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>> 
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=245455
>>>>>> 
>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>> 
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=245
>>>>> 473
>>>>> 
>>>>> To unsubscribe from this discussion, e-mail:
>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>> 
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=245476
>>>> 
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>> 
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=245
>>> 481
>>> 
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=245485
>> 
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=245
> 486
> 
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=245488

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list