[GE users] SGE 6.2u2 Install Fails "Admin User Missing" on all hosts

mhanby mhanby at uab.edu
Thu Sep 17 21:45:45 BST 2009


Nope, both interfaces existed and were configured during the install of the operating system.

I created the host_aliases file and that appears to work following a restart of the sgemaster service:

cat $SGE_ROOT/default/common/host_aliases

rockstest rockstest.local rockstest.uabgrid.uab.edu

Now, when sgemaster starts, the name 'rockstest' is what appears in act_qmaster

Thanks for pointing that out,

Mike

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Thursday, September 17, 2009 2:46 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] SGE 6.2u2 Install Fails "Admin User Missing" on all hosts

Am 17.09.2009 um 20:01 schrieb mhanby:

> Yes, this is the hosts file:
>
> 127.0.0.1    localhost.localdomain localhost
> 172.99.99.1  rockstest.local rockstest # originally frontend-0-0
> 192.168.2.10 rockstest.uabgrid.uab.edu
>
> This is on the head node, so it's multihoned, eth1 (192.168.2.10)  
> connecting to the public network, eth0 to the private cluster  
> network (172.99.99.1).

According to my knowledge and http://www.tcpipguide.com/free/ 
t_IPReservedPrivateandLoopbackAddresses-3.htm it's a mismatch:

172.99.*.* is a public address
192.168.*.* is private

The name of eth0 will be used by SGE, unless you defined aliases in  
$SGE_ROOT/default/common/host_aliases Was one network interface added  
later on?

-- Reuti

>
> It would seem that all of the processes should either use the FQDN  
> or the short name, rockstest, not both.
>
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Thursday, September 17, 2009 12:36 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] SGE 6.2u2 Install Fails "Admin User  
> Missing" on all hosts
>
> Am 17.09.2009 um 19:24 schrieb mhanby:
>
>> Some debugging, this code is in the start script:
>>
>> HOST=`$utilbin_dir/gethostname -aname`
>>
>> echo $HOST
>> rockstest.uabgrid.uab.edu
>>
>> I can't find within the start script how it's coming up with
>> rockstest.local, I'm thinking this must be happening in one of the
>> executable binaries,
>> perhaps sge_master?
>
> Is it in /etc/hosts?
>
> -- Reuti
>
>
>> -----Original Message-----
>> From: mhanby [mailto:mhanby at uab.edu]
>> Sent: Thursday, September 17, 2009 12:18 PM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] SGE 6.2u2 Install Fails "Admin User
>> Missing" on all hosts
>>
>> Now on to the second problem, the hostname of the qmaster.
>>
>> rockstest.uabgrid.uab.edu
>> rockstest.local
>>
>> The sgemaster script and the $SGE_ROOT/default/common/act_qmaster
>> file can't seem to agree on what the host name should be.
>>
>> I notice that the installer started the sge_qmaster process. I then
>> decide to stop it:
>>
>> /etc/init.d/sgemaster.p536 stop
>>
>> no output
>>
>> /etc/init.d/sgemaster.p536 start
>> sge_master didn't start!
>> This is not a qmaster host!
>> Please, check your act_qmaster file!
>>
>> cat $(find $SGE_ROOT -name act_qmaster)
>> rockstest.local
>>
>> If I edit the act_qmaster file and replace "rockstest.local" with
>> "rockstest.uabgrid.uab.edu" and then run the stop command:
>>
>> /etc/init.d/sgemaster.p536 stop
>>   shutting down Grid Engine qmaster
>>
>> And ps confirms that no sge processes are running.
>>
>> Now, if I try to start it again (remember that act_qmaster has the
>> FQDN in it)
>> /etc/init.d/sgemaster.p536 start
>>
>>   starting sge_qmaster
>> sge_qmaster is running on another host (rockstest.uabgrid.uab.edu)
>>
>> The ps command now shows sge_master is running
>>
>> If I cat act_qmaster again, the hostname is rockstest.local
>>
>> cat $(find $SGE_ROOT -name act_qmaster)
>> rockstest.local
>>
>> The /etc/hosts file has these entries:
>>
>> 127.0.0.1    localhost.localdomain localhost
>> 172.99.99.1  rockstest.local rockstest # originally frontend-0-0
>> 192.168.2.10 rockstest.uabgrid.uab.edu
>>
>> If I swap "rockstest.lcoal rockstest" to "rockstest
>> rockstest.local" then rockstest will end up in the act_qmaster file.
>>
>> Any ideas why the host names are getting swapped around, bungled,
>> etc...?
>>
>> Thanks,
>>
>> Mike
>>
>> -----Original Message-----
>> From: mhanby [mailto:mhanby at uab.edu]
>> Sent: Thursday, September 17, 2009 12:03 PM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] SGE 6.2u2 Install Fails "Admin User
>> Missing" on all hosts
>>
>> I used the GE 6.2u3 installer this time and encountered the same
>> issue where the installer reports that the Admin user doesn't exist.
>>
>> I found a work around.
>>
>> This is on a Rocks 5.1 test cluster that has Gridengine 6.1u5
>> installed. The scripts /etc/profile.d/sge-binaries.{sh,csh} were
>> causing problems with the installer. Those scripts are essentially
>> the settings.{sh,csh} from the 6.1u5 install, and apparently the
>> SGE_ROOT and other vars set in those scripts was causing problems.
>>
>> I was able to work around it by either temporarily removing those
>> from the qmaster node and exec nodes, or by unsetting the variables
>> in my /root/.bash_profile on all of the nodes.
>>
>> I would have thought the installer would override those variables
>> or unset them since you can install new versions while other
>> versions are running.
>>
>> anywho, figured I'd report the info.
>>
>> Mike
>>
>> -----Original Message-----
>> From: mhanby [mailto:mhanby at uab.edu]
>> Sent: Friday, March 27, 2009 9:29 AM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] SGE 6.2u2 Install Fails "Admin User
>> Missing" on all hosts
>>
>> Thanks, that's what I figured. I've restored the virtual machine to a
>> snapshot prior to the install of SGE 6.2u2 so I can try again.
>>
>> I'll download the latest binaries and give it another go.
>>
>> Mike
>> -----Original Message-----
>> From: Lubomir.Petrik at sun.com [mailto:Lubomir.Petrik at sun.com]
>> Sent: Thursday, March 26, 2009 12:35 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] SGE 6.2u2 Install Fails "Admin User
>> Missing" on
>> all hosts
>>
>> mhanby wrote:
>>> And on the compute node
>>> $ ssh compute-0-0 cat /tmp/check_test
>>> /share/sge/utilbin/lx24-amd64/adminrun sge test -w /share/sge
>>> exit_code=0
>>>
>>> Odd thing, this time the user lookup succeeded and both the qmaster
>> and
>>> exec host installed without error.
>>>
>>> I realized just now, I hadn't changed the permissions on the
>> /share/sge
>>> from root to sge prior to running the install the first time. I did
>>> chown -R sge:sge /share/sge after the first install, so maybe that
>>> had
>>> something to do with it?
>>>
>> That is strange. The adminrun has execute for everyone, so it doesn't
>> matter who owns it. Actually doing chown -R sge:sge /share/sge is not
>> very good idea. You may now want to call as root
>> $SGE_ROOT/util/setfileperms.sh, some files must be owned by root
>> (e.g.:
>> utilbin/$ARCH/authuser).
>>
>> Lubos.
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessage
>> Id=144176
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=145345
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=217669
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=217671
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=217674
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=217677
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=217680
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=217694

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=217702

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list