[GE users] SGE 5.3p6 broken since IP/subnet migration - need help!

Richard Hobbs richard.hobbs at crl.toshiba.co.uk
Mon Jun 25 21:24:08 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello,

Right, well i do not have any entries in /etc/services that contain "sge"
(but there's no way this file has been manually edited since the re-IP),
and while you are correct in stating that the $SGE_COMMD_PORT variable is
not set, i don't know where it's meant to be set because that string does
not exist in any of the files in our $SGE_ROOT directory at all (again,
there's no way these files could have been manually edited since the
re-IP).

So, on that basis, this worked before we changed the IP address and subnet
mask, with the config files being in the same state they are now, so why
has it suddenly broken?

Also, what is the "$SGE_COMMD_PORT" variable meant to be set to? Where can
i specify it, and as which user can i "echo" it?

Regarding the hostname resolution issues, here is my output:

======================================================================
[root at stg2 glinux]# pwd
/rmt/stg2_1/sge/utilbin/glinux
[root at stg2 glinux]# ./gethostname
Hostname: stg2.domain.co.uk
Aliases:
Host Address(es): 192.168.144.2
[root at stg2 glinux]# ./gethostbyaddr 192.168.144.2
Hostname: stg2.domain.co.uk
Aliases:
Host Address(es): 192.168.144.2
[root at stg2 glinux]#
======================================================================

As you can see, it works perfectly!

And, as i said before, i've tried adding this to /etc/hosts and it does
not make a difference.

Remember this is version 5.3p6 by the way! :-)

Thanks again,
Richard.


On Mon, June 25, 2007 9:02 pm, Chris Dagdigian said:
> Hello,
>
> Seems like 2 problems ...
>
> The sge_commd/tcp error is caused by a missing entry in /etc/services
> for sge_commd or perhaps an accidentally unset $SGE_COMMD_PORT
> environment variable.
>
> Everything else after that does not matter. You need to sort out the
> TCP port and services issue before anything else has a hope of
> starting up properly.
>
> For hostname and resolution issues the best  tools to use are the
> actual SGE binaries in your utilbin directory:
>
> Example:
>> [root at dcore-amd sge-6s2u1]# /opt/sge-6s2u1/utilbin/lx26-amd64/
>> gethostname
>> Hostname: dcore-amd.sonsorol.net
>> Aliases:  dcore-amd
>> Host Address(es): 66.92.70.152
>>
>> [root at dcore-amd sge-6s2u1]# /opt/sge-6s2u1/utilbin/lx26-amd64/
>> gethostbyaddr 66.92.70.152
>> Hostname: dcore-amd.sonsorol.net
>> Aliases:  dcore-amd
>> Host Address(es): 66.92.70.152
>> [root at dcore-amd sge-6s2u1]#
>
> If you fix your sge_commd/tcp error it may start -- I've always
> personally found that SGE will honor entries in the /etc/hosts file
>
> -Chris
>
>
>
>
>
> The hostname
>
> On Jun 25, 2007, at 3:57 PM, Richard Hobbs wrote:
>
>> Hello,
>>
>> We have recently migrated our network from a 192.168.3.0/255.255.255.0
>> network to a 192.168.128.0/255.255.128.0 network, and since doing
>> so, our
>> qmaster will not start.
>>
>> We keep getting the following:
>>
>> ======================================================================
>> [root at stg2 sge]# /etc/init.d/rcsge start
>>    starting sge_qmaster
>> critical error: can't check for running qmaster: can't resolve service
>> "sge_commd/tcp"
>>    starting sge_schedd
>> error: can't resolve hostname "stg2.domain.co.uk"
>> error: can't get configuration from qmaster -- backgrounding
>>    starting sge_execd
>> critical error: can't enroll to commd: CANT GET SERVICE
>> [root at stg2 sge]#
>> ======================================================================
>>
>> Does anyone know what is causing this?
>>
>> I have even tried a global find and replace of the old IP address
>> range
>> for the new IP address range, but it still doesn't startup.
>>
>> I'm getting desperate now, and have no ideas left, so any
>> suggestions are
>> gratefully received! :-)
>>
>> Just for the record, the same user on the same machine in the same
>> terminal *can* resolve stg2.domain.co.uk, as below:
>>
>> ======================================================================
>> [root at stg2 sge]# host stg2.crl.toshiba.co.uk
>> stg2.domain.co.uk has address 192.168.144.2
>> [root at stg2 sge]#
>> ======================================================================
>>
>> And yes - i've also tried adding stg2.domain.co.uk to /etc/hosts,
>> but the
>> qmaster just will not start.
>>
>> Please help! :-)
>>
>> Thanks in advance,
>> Richard.
>>
>> --
>> Richard Hobbs (Systems Administrator)
>> Toshiba Research Europe Ltd. - Speech Technology Group
>> Web: http://www.toshiba-europe.com/research/
>> Email: richard.hobbs at crl.toshiba.co.uk
>> Tel: +44 1223 376964        Mobile: +44 7811 803377
>>
>>
>>
>>
>> _____________________________________________________________________
>> This e-mail has been scanned for viruses by Verizon Business
>> Internet Managed Scanning Services - powered by MessageLabs. For
>> further information visit http://www.verizonbusiness.com/uk
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> --
> Chris Dagdigian  <dag at sonsorol.org>
> Current coordinates: Boston-area, USA
> GPS: http://bioteam.net/dagbin/gps?42.385693+N+71.115535+W
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> _____________________________________________________________________
> This e-mail has been scanned for viruses by Verizon Business Internet
> Managed Scanning Services - powered by MessageLabs. For further
> information visit http://www.verizonbusiness.com/uk
>

-- 
Richard Hobbs (Systems Administrator)
Toshiba Research Europe Ltd. - Speech Technology Group
Web: http://www.toshiba-europe.com/research/
Email: richard.hobbs at crl.toshiba.co.uk
Tel: +44 1223 376964        Mobile: +44 7811 803377



_____________________________________________________________________
This e-mail has been scanned for viruses by Verizon Business Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.verizonbusiness.com/uk

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list