[GE users] SGE 6.2u2 Install Fails "Admin User Missing" on all hosts

reuti reuti at staff.uni-marburg.de
Thu Sep 17 18:35:49 BST 2009


Am 17.09.2009 um 19:24 schrieb mhanby:

> Some debugging, this code is in the start script:
>
> HOST=`$utilbin_dir/gethostname -aname`
>
> echo $HOST
> rockstest.uabgrid.uab.edu
>
> I can't find within the start script how it's coming up with  
> rockstest.local, I'm thinking this must be happening in one of the  
> executable binaries,
> perhaps sge_master?

Is it in /etc/hosts?

-- Reuti


> -----Original Message-----
> From: mhanby [mailto:mhanby at uab.edu]
> Sent: Thursday, September 17, 2009 12:18 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] SGE 6.2u2 Install Fails "Admin User  
> Missing" on all hosts
>
> Now on to the second problem, the hostname of the qmaster.
>
> rockstest.uabgrid.uab.edu
> rockstest.local
>
> The sgemaster script and the $SGE_ROOT/default/common/act_qmaster  
> file can't seem to agree on what the host name should be.
>
> I notice that the installer started the sge_qmaster process. I then  
> decide to stop it:
>
> /etc/init.d/sgemaster.p536 stop
>
> no output
>
> /etc/init.d/sgemaster.p536 start
> sge_master didn't start!
> This is not a qmaster host!
> Please, check your act_qmaster file!
>
> cat $(find $SGE_ROOT -name act_qmaster)
> rockstest.local
>
> If I edit the act_qmaster file and replace "rockstest.local" with  
> "rockstest.uabgrid.uab.edu" and then run the stop command:
>
> /etc/init.d/sgemaster.p536 stop
>   shutting down Grid Engine qmaster
>
> And ps confirms that no sge processes are running.
>
> Now, if I try to start it again (remember that act_qmaster has the  
> FQDN in it)
> /etc/init.d/sgemaster.p536 start
>
>   starting sge_qmaster
> sge_qmaster is running on another host (rockstest.uabgrid.uab.edu)
>
> The ps command now shows sge_master is running
>
> If I cat act_qmaster again, the hostname is rockstest.local
>
> cat $(find $SGE_ROOT -name act_qmaster)
> rockstest.local
>
> The /etc/hosts file has these entries:
>
> 127.0.0.1    localhost.localdomain localhost
> 172.99.99.1  rockstest.local rockstest # originally frontend-0-0
> 192.168.2.10 rockstest.uabgrid.uab.edu
>
> If I swap "rockstest.lcoal rockstest" to "rockstest  
> rockstest.local" then rockstest will end up in the act_qmaster file.
>
> Any ideas why the host names are getting swapped around, bungled,  
> etc...?
>
> Thanks,
>
> Mike
>
> -----Original Message-----
> From: mhanby [mailto:mhanby at uab.edu]
> Sent: Thursday, September 17, 2009 12:03 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] SGE 6.2u2 Install Fails "Admin User  
> Missing" on all hosts
>
> I used the GE 6.2u3 installer this time and encountered the same  
> issue where the installer reports that the Admin user doesn't exist.
>
> I found a work around.
>
> This is on a Rocks 5.1 test cluster that has Gridengine 6.1u5  
> installed. The scripts /etc/profile.d/sge-binaries.{sh,csh} were  
> causing problems with the installer. Those scripts are essentially  
> the settings.{sh,csh} from the 6.1u5 install, and apparently the  
> SGE_ROOT and other vars set in those scripts was causing problems.
>
> I was able to work around it by either temporarily removing those  
> from the qmaster node and exec nodes, or by unsetting the variables  
> in my /root/.bash_profile on all of the nodes.
>
> I would have thought the installer would override those variables  
> or unset them since you can install new versions while other  
> versions are running.
>
> anywho, figured I'd report the info.
>
> Mike
>
> -----Original Message-----
> From: mhanby [mailto:mhanby at uab.edu]
> Sent: Friday, March 27, 2009 9:29 AM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] SGE 6.2u2 Install Fails "Admin User  
> Missing" on all hosts
>
> Thanks, that's what I figured. I've restored the virtual machine to a
> snapshot prior to the install of SGE 6.2u2 so I can try again.
>
> I'll download the latest binaries and give it another go.
>
> Mike
> -----Original Message-----
> From: Lubomir.Petrik at sun.com [mailto:Lubomir.Petrik at sun.com]
> Sent: Thursday, March 26, 2009 12:35 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] SGE 6.2u2 Install Fails "Admin User  
> Missing" on
> all hosts
>
> mhanby wrote:
>> And on the compute node
>> $ ssh compute-0-0 cat /tmp/check_test
>> /share/sge/utilbin/lx24-amd64/adminrun sge test -w /share/sge
>> exit_code=0
>>
>> Odd thing, this time the user lookup succeeded and both the qmaster
> and
>> exec host installed without error.
>>
>> I realized just now, I hadn't changed the permissions on the
> /share/sge
>> from root to sge prior to running the install the first time. I did
>> chown -R sge:sge /share/sge after the first install, so maybe that  
>> had
>> something to do with it?
>>
> That is strange. The adminrun has execute for everyone, so it doesn't
> matter who owns it. Actually doing chown -R sge:sge /share/sge is not
> very good idea. You may now want to call as root
> $SGE_ROOT/util/setfileperms.sh, some files must be owned by root  
> (e.g.:
> utilbin/$ARCH/authuser).
>
> Lubos.
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessage
> Id=144176
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=145345
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=217669
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=217671
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=217674
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=217677

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list