[GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 zones

Bernard Li bli at bcgsc.ca
Fri Jan 14 19:35:30 GMT 2005


Hi Marco:

I have always done a fresh autoinstallation (i.e. revert the directory
to the state before installation), so I don't think the problem is with
the existence of directories.

The following is the template I've used:

---
#-------------------------------------------------
# SGE default configuration file
#-------------------------------------------------

# Use always fully qualified pathnames, please

# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
SGE_ROOT="/opt/sge"

# SGE_QMASTER_PORT is used by qmaster for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_QMASTER_PORT="536"

# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_EXECD_PORT="537"

# CELL_NAME, will be a dir in SGE_ROOT, contains the common dir
# Please enter only the name of the cell. No path, please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"

# The dir, where qmaster spools this parts, which are not spooled by DB
#(mandatory for qmaster installation)
QMASTER_SPOOL_DIR="/opt/sge/default/spool/qmaster"

# The dir, where the execd spools (active jobs)
# This entry is needed, even if your are going to use
# berkeley db spooling. Only cluster configuration and jobs will
# be spooled in the database. The execution daemon still needs a spool
# directory  
#(mandatory for qmaster installation)
EXECD_SPOOL_DIR="/opt/sge/common/default/spool"

# For monitoring and accounting of jobs, every job will get
# unique GID. So you have to enter a free GID Range, which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation)
GID_RANGE="20000-20100"

# If SGE is compiled with -spool-dynamic, you have to enter here, which
# spooling method should be used. (classic or berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"

# Name of the Server, where the Spooling DB is running on
# if spooling methode is berkeleydb, it must be "none", when
# using no spooling server and it must containe the servername
# if a server should be used. In case of "classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"

# The dir, where the DB spools
# If berkeley db spooling is used, it must contain the path to
# the spooling db. Please enter the full path. (eg. /tmp/data/spooldb)
# Remember, this directory must be local on the qmaster host or on the
# Berkeley DB Server host. No NSF mount, please
DB_SPOOLING_DIR="spooldb"

# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
ADMIN_HOST_LIST="headnode node01 node02 node03 node04 node05 node06
node07 node08 node09 node10 node11 node12 node13 node14 node15 node16
node17 node18 node19 node20"

# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
SUBMIT_HOST_LIST="headnode"


# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
# (mandatory for execution host installation)
EXEC_HOST_LIST="node01 node02 node03 node04 node05 node06 node07 node08
node09
node10 node11 node12 node13 node14 node15 node16 node17 node18 node19
node20"

# The dir, where the execd spools (local configuration)
# If you want configure your execution daemons to spool in
# a local directory, you have to enter this directory here.
# If you do not want to configure a local execution host spool directory
# please leave this empty
EXECD_SPOOL_DIR_LOCAL="/opt/sge/default/spool"

# If true, the domainnames will be ignored, during the hostname
resolving
# if false, the fully qualified domain name will be used for name
resolving
HOSTNAME_RESOLVING="true"

# Shell, which should be used for remote installation (rsh/ssh)
# This is only supported, if your hosts and rshd/sshd is configured,
# not to ask for a password, or promting any message.
SHELL_NAME="ssh"

# Enter your default domain, if you are using /etc/hosts or NIS
configuration
DEFAULT_DOMAIN="none"

# If a job stops, fails, finnish, you can send a mail to this adress
ADMIN_MAIL="none"

# If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added,
# to start automatically during boottime
ADD_TO_RC="false"

#If this is "true" the file permissions of executables will be set to
755
#and of ordenary file to 644.  
SET_FILE_PERMS="true"

# This option is not implemented, yet.
# When a exechost should be uninstalled, the running jobs will be
rescheduled
RESCHEDULE_JOBS="wait"

# Enter a one of the three distributed scheduler tuning configuration
sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"

# The name of the shadow host. This host must have read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter the servername
# (mandatory for shadowhost installation)
SHADOW_HOST="hostname"

# Remove this execution hosts in automatic mode
# (mandatory for unistallation of execution hosts)
EXEC_HOST_LIST_RM="host1 host2 host3 host4"

# This option is used for startup script removing. 
# If true, all rc startup scripts will be removed during
# automatic deinstallation. If false, the scripts won't
# be touched.
# (mandatory for unistallation of execution/qmaster hosts)
REMOVE_RC="false"
---

I don't think they appear as NFS mounts, but you made a good point that
perhaps Berkeley DB does not like this special method of sharing
filesystems, anyways, here's a df of what the directory looks like on
one of the nodes:

-bash-2.05b# df -h
Filesystem             size   used  avail capacity  Mounted on
/                      6.9G   3.9G   2.9G    58%    /
/dev                   6.9G   3.9G   2.9G    58%    /dev
/export/home            26G    27M    26G     1%    /export/home
/lib                   6.9G   3.9G   2.9G    58%    /lib
/opt                   6.9G   3.9G   2.9G    58%    /opt

/opt is the directory being shared out.

I can try not to use berkeleydb and see if that works.

Thanks,

Bernard 

> -----Original Message-----
> From: Marco Donauer [mailto:Marco.Donauer at Sun.COM] 
> Sent: Friday, January 14, 2005 11:30
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on 
> Solaris 10 zones
> 
> Bernhard,
> 
> hm there is something going wrong with berkeley db init.
> It looks like a corrupted command.
> 
> One info for you, the autoinstall does not delete any 
> existing directories, this is to prevent a unwanted deletion 
> of system data or databases of other clusters.
> After a failed autoinstallation the SGE_CELL directory and 
> the Berkeley DB spooling directory are still there. When the 
> autoinstallation finds one of these dirs, it breaks.
> But the error log doesn't look like this.
> 
> Could you send me the install config? (/tmp/gsc_template.conf )
> 
> Is it right, that using zone, the local directories are 
> looking like nfs mounted?
> Please consider this, using the berkely db. Berkely db is not 
> running on an nfs mount.
> 
> Regards,
> Marco
> 
> Bernard Li wrote:
> 
> >In case the attached log didn't show up, I'm pasting it inline...
> >
> >---
> >Starting qmaster installation!
> >Reading configuration from file /tmp/gsc_template.conf
> >
> >
> >
> >Your $SGE_ROOT directory: /opt/sge
> >
> >Using SGE_QMASTER_PORT >536<.
> >
> >Using SGE_EXECD_PORT >537<.
> >
> >Using >default< as CELL_NAME.
> >Using >/opt/sge/default/spool/qmaster< as QMASTER_SPOOL_DIR.
> >
> >
> >Using >true< as IGNORE_FQDN_DEFAULT.
> >If it's >true<, the domainname will be ignored.
> >
> >Making directories
> >
> >Setting spooling method to dynamic
> >
> >Dumping bootstrapping information
> >Initializing spooling database
> >
> >
> >Using >20000-20100< as gid range.
> >Using >/opt/sge/common/default/spool< as EXECD_SPOOL_DIR.
> >Using >none< as ADMIN_MAIL.
> >Reading in complex attributes.
> >Adding default parallel environments (PE) Reading in parallel 
> >environments:
> >	PE "make".
> >Reading in usersets:
> >	Userset "defaultdepartment".
> >	Userset "deadlineusers".
> >usage:
> > ./utilbin/sol-x86/spooldefaults command
> >
> >create default entries during installation process following are the 
> >valid commands:
> >test                          test the spooling framework
> >adminhosts <template_dir>     create admin hosts
> >calendars <template_dir>      create calendars
> >ckpts <template_dir>          create checkpoint environments
> >complexes <template_dir>      create complexes
> >configuration <template>      create the global configuration
> >cqueues <template_dir>        create cluster queues
> >exechosts <template_dir>      create execution hosts
> >local_conf <template> <name>  create a local configuration managers 
> ><mgr1> [<mgr2> ...]  create managers
> >operators <op1> [<op2> ...]   create operators
> >pes <template_dir>            create parallel environments
> >projects <template_dir>       create projects
> >sharetree <template>          create sharetree
> >submithosts <template_dir>    create submit hosts
> >users <template_dir>          create users
> >usersets <template_dir>       create usersets
> >
> >Command failed: ./utilbin/sol-x86/spooldefaults managers
> >
> >Probably a permission problem. Please check file access permissions.
> >Check read/write permission. Check if SGE daemons are running.
> >
> >Command failed: managers./utilbin/sol-x86/spooldefaults
> >Probably a permission problem. Please check file access permissions.
> >Check read/write permission. Check if SGE daemons are running.
> >---
> >
> >Cheers,
> >
> >Bernard
> >
> >  
> >
> >>-----Original Message-----
> >>From: Bernard Li [mailto:bli at bcgsc.ca]
> >>Sent: Friday, January 14, 2005 10:56
> >>To: users at gridengine.sunsource.net
> >>Subject: RE: [GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 
> >>zones
> >>
> >>Hi Marco:
> >>
> >>Thanks - okay this is clearer now.  I have chown'ed the 
> directory of 
> >>SGE_ROOT to sgeadmin and ran the autoinstallation, however 
> I got the 
> >>attached error log.
> >>
> >>I have already chown and chgrp recursively the SGE_ROOT 
> directory, so 
> >>I don't understand why it could be a permission issue.
> >>
> >>Are any of these bugs going to be fixed in 6.0u2?  I think getting 
> >>autoinstallation working seamlessly is very important as this makes 
> >>deployment of SGE on large clusters much easier.
> >>
> >>Thanks,
> >>
> >>Bernard
> >>
> >>    
> >>
> >>>-----Original Message-----
> >>>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM]
> >>>Sent: Friday, January 14, 2005 0:24
> >>>To: users at gridengine.sunsource.net
> >>>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on 
> Solaris 10 
> >>>zones
> >>>
> >>>Bernhard,
> >>>
> >>>no there is still a bug in the autoinstall.
> >>>The auto procedure uses the owner of the SGE_ROOT as admin user.
> >>>I know this is bad, buf if your  SGE_ROOT dir is owned by
> >>>      
> >>>
> >>"sgeadmin" 
> >>    
> >>
> >>>user, adminuser will be "sgeadmin".
> >>>
> >>>Regards,
> >>>Marco
> >>>
> >>>Bernard Li wrote:
> >>>
> >>>      
> >>>
> >>>>Hello all:
> >>>>
> >>>>Trying to set up SGE 6.0u1 with Solaris 10 containers/zones using 
> >>>>autoinstallation.
> >>>>
> >>>>In the manual it says that 'You must change the ownership of the 
> >>>>sge-root directory to belong to your existing 
> administrative user'.
> >>>>Let's say I want the administrative user to be 'sgeadmin' -
> >>>>        
> >>>>
> >>>does that
> >>>      
> >>>
> >>>>mean I need to install qmaster manually as root, set
> >>>>        
> >>>>
> >>sgeadmin as the
> >>    
> >>
> >>>>administrative user then continue with autoinstallation using the 
> >>>>sgeadmin user account?
> >>>>
> >>>>Thanks,
> >>>>
> >>>>Bernard
> >>>>
> >>>>        
> >>>>
> >>>-----------------------------------------------------------
> ----------
> >>>      
> >>>
> >>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>For additional commands, e-mail: 
> >>>>        
> >>>>
> >>users-help at gridengine.sunsource.net
> >>    
> >>
> >>>> 
> >>>>
> >>>>        
> >>>>
> >>>
> >>>      
> >>>
> >>------------------------------------------------------------
> ---------
> >>    
> >>
> >>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> >>>
> >>>
> >>>      
> >>>
> >>    
> >>
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list