[GE users] SGE 6.2u2 Install Fails "Admin User Missing" on all hosts

mhanby mhanby at uab.edu
Thu Mar 26 15:56:37 GMT 2009


Interesting, after rearranging the code as shown below, it reveals that
my qmaster is not in the act_qmaster file.

Looking in the act_qmaster file, it has one entry:

rocks5-test.local

Now if I run the command that the init.d script uses to get the
hostname, it returns the fully qualified name:

$utilbin_dir/gethostname -aname
rocks5-test.eng.uab.edu

Possibly the installer is using a different method to determine the name
of the qmaster?

Changing the act_qmaster entry to the fully qualified name got sgemaster
to start on the qmaster host.

Mike

-----Original Message-----
From: mhanby [mailto:mhanby at uab.edu] 
Sent: Thursday, March 26, 2009 10:45 AM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] SGE 6.2u2 Install Fails "Admin User Missing" on
all hosts

After install I'm not able to start the qmaster, the init.d script
doesn't produce any output.
The command you asked me to run returns 0 (after loading the settings.sh
to my profile).

Looking through the startup routine, there's a block of code that
appears to be backwards:

   if [ $qmaster = true ]; then
      DetectSMFService qmaster
      #We want to use smf
      if [ -z "$SMF_FMRI" -a -n "$service"  ]; then
         echo "   starting sge_qmaster"
         svcadm enable -st $service
      #For -migrate with SMF qmaster_host is not yet set for SMF start
(2nd)
      elif [ $qmaster_host = true -o \( -n "$SMF_FMRI" -a "$SMF_FMRI" =
"$service" \) ]; then
         echo "   starting sge_qmaster"
         $bin_dir/sge_qmaster
         [ $? -eq 0 -a -d /var/lock/subsys ] && touch
/var/lock/subsys/sgemaster >/dev/null 2>&1
         CheckRunningQmaster
      fi
   elif [ $qmaster = true -a $qmaster_host = false ]; then
      echo
      echo "sge_qmaster didn't start!"
      echo "This is not a qmaster host!"
      echo "Please, check your act_qmaster file!"
      echo
   fi

The elif will never get executed because $qmaster = true was satisfied
in the first if, maybe it should be:
   if [ $qmaster = true -a $qmaster_host = false ]; then
      echo
      echo "sge_qmaster didn't start!"
      echo "This is not a qmaster host!"
      echo "Please, check your act_qmaster file!"
      echo
   elif [ $qmaster = true ]; then
      DetectSMFService qmaster
      #We want to use smf
      if [ -z "$SMF_FMRI" -a -n "$service"  ]; then
         echo "   starting sge_qmaster"
         svcadm enable -st $service
      #For -migrate with SMF qmaster_host is not yet set for SMF start
(2nd)
      elif [ $qmaster_host = true -o \( -n "$SMF_FMRI" -a "$SMF_FMRI" =
"$service" \) ]; then
         echo "   starting sge_qmaster"
         $bin_dir/sge_qmaster
         [ $? -eq 0 -a -d /var/lock/subsys ] && touch
/var/lock/subsys/sgemaster >/dev/null 2>&1
         CheckRunningQmaster
      fi
   fi
-----Original Message-----
From: Lubomir.Petrik at sun.com [mailto:Lubomir.Petrik at sun.com] 
Sent: Thursday, March 26, 2009 10:26 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] SGE 6.2u2 Install Fails "Admin User Missing" on
all hosts

Hmm. Seems like our detection is not completely correct, since the 
installation was able to finish. you have nothing to worry about. Can 
you try on the an execd host following command? Did it succeed?

$SGE_ROOT/utilbin/`$SGE_ROOT/util/arch`/adminrun sge echo
echo $?

Lubos.

mhanby wrote:
> Howdy,
>
> We are testing out an install of SGE 6.2u2 using the RPMs for RHEL5
> 64bit.
>
> We choose custom install, set the Admin user to sge (which exists on
the
> qmaster and exec hosts). When we click install, all nodes (including
the
> qmaster) report that the admin user is missing.
>
> We've even tried using root as a test, that account is also reported
as
> missing.
>
> If I tell it to proceed (using Admin user sge), all of the nodes are
> reported as installing successfully.
>
> The log for the exec hosts is as follows, only error being that TERM
is
> not set in the environment:
>
> OUTPUT:
> Your $SGE_ROOT directory: /share/sge
> Using cell: >default<
> Creating local configuration for host >compute-0-0.local<
> sge at compute-0-0.local added "compute-0-0.local" to configuration list
> Local configuration for host >compute-0-0.local< created.
> Adding submit host >compute-0-0<
> compute-0-0.local added to submit host list
> cp /share/sge/default/common/sgeexecd /etc/init.d/sgeexecd.p6444
> /usr/lib/lsb/install_initd /etc/init.d/sgeexecd.p6444
>    starting sge_execd
> root at compute-0-0.local modified "@allhosts" in host group list
> root at compute-0-0.local modified "all.q" in cluster queue list
> Execd on host compute-0-0.local is running!
>
> ERROR:
> TERM environment variable not set.
>
> Mike
>
> ------------------------------------------------------
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=144047
>
> To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=144060

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=144075

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=144089

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list