[GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 zones

Marco Donauer Marco.Donauer at Sun.COM
Mon Jan 17 11:21:29 GMT 2005


Hi Bernhard,

the install script has to be executed as user root.
Installing as any other user may result in a non or limited working
gridengine.

I guess cheroot will make some problems, because the install script need
/usr/bin binaries (ls, touch, rm, cat),
which won't be reachable in this case.

You can install the execd's using ./inst_sge -x -auto <conf>, if rsh/ssh
allows the user root to login without asking for
a password.

Regards,
Marco

Bernard Li wrote:
> Hi Marco:
> 
> Attached is the log file with debug.
> 
> spooldb should be seen as a 'local' file system on the node, it is also
> shared out from /opt.
> 
> Yes I used the pre-compiled binaries - I was able to install SGE
> manually as root, but I never have any success if I'm the 'sgeadmin'
> user.
> 
> If I manually install qmaster as root, chroot SGE_ROOT, do you think I
> can install the execution hosts automatically by running:
> 
> ./inst_sge -x -auto <conf>
> 
> ?
> 
> Thanks,
> 
> Bernard
>  
> 
> 
>>-----Original Message-----
>>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM] 
>>Sent: Friday, January 14, 2005 11:51
>>To: users at gridengine.sunsource.net
>>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on 
>>Solaris 10 zones
>>
>>Bernhard,
>>
>>Your right it doesn't look like a directory problem. It 
>>should be an additional info only.
>>Your config file is ok. The directory spooldb is on a local 
>>file system? 
>>Is it created?
>>I think so because the creation an first init of berkeley db works.
>>It fails at that point where the managers will be added!
>>
>>Did you use  precompiled binaries? Is it possible that your 
>>package corrupt?
>>Does the manual installation work?
>>
>>At the moment I have no further idea.
>>Could you send me the script debug output?
>>The set -x shell switch you know?
>>
>>Regards,
>>Marco
>>
>>Bernard Li wrote:
>>
>>
>>>Hi Marco:
>>>
>>>I have always done a fresh autoinstallation (i.e. revert the 
>>
>>directory 
>>
>>>to the state before installation), so I don't think the 
>>
>>problem is with 
>>
>>>the existence of directories.
>>>
>>>The following is the template I've used:
>>>
>>>---
>>>#-------------------------------------------------
>>># SGE default configuration file
>>>#-------------------------------------------------
>>>
>>># Use always fully qualified pathnames, please
>>>
>>># SGE_ROOT Path, this is basic information #(mandatory for 
>>
>>qmaster and 
>>
>>>execd installation) SGE_ROOT="/opt/sge"
>>>
>>># SGE_QMASTER_PORT is used by qmaster for communication # 
>>
>>Please enter 
>>
>>>the port in this way: 1300 # Please do not this: 1300/tcp 
>>
>>#(mandatory 
>>
>>>for qmaster installation) SGE_QMASTER_PORT="536"
>>>
>>># SGE_EXECD_PORT is used by execd for communication # Please 
>>
>>enter the 
>>
>>>port in this way: 1300 # Please do not this: 1300/tcp 
>>
>>#(mandatory for 
>>
>>>qmaster installation) SGE_EXECD_PORT="537"
>>>
>>># CELL_NAME, will be a dir in SGE_ROOT, contains the common dir # 
>>>Please enter only the name of the cell. No path, please 
>>
>>#(mandatory for 
>>
>>>qmaster and execd installation) CELL_NAME="default"
>>>
>>># The dir, where qmaster spools this parts, which are not 
>>
>>spooled by DB 
>>
>>>#(mandatory for qmaster installation) 
>>>QMASTER_SPOOL_DIR="/opt/sge/default/spool/qmaster"
>>>
>>># The dir, where the execd spools (active jobs) # This entry 
>>
>>is needed, 
>>
>>>even if your are going to use # berkeley db spooling. Only cluster 
>>>configuration and jobs will # be spooled in the database. 
>>
>>The execution 
>>
>>>daemon still needs a spool # directory #(mandatory for qmaster 
>>>installation) EXECD_SPOOL_DIR="/opt/sge/common/default/spool"
>>>
>>># For monitoring and accounting of jobs, every job will get # unique 
>>>GID. So you have to enter a free GID Range, which # is 
>>
>>assigned to each 
>>
>>>job running on a machine.
>>># If you want to run 100 Jobs at the same time on one host 
>>
>>you # have 
>>
>>>to enter a GID-Range like that: 16000-16100 #(mandatory for qmaster 
>>>installation) GID_RANGE="20000-20100"
>>>
>>># If SGE is compiled with -spool-dynamic, you have to enter 
>>
>>here, which 
>>
>>># spooling method should be used. (classic or berkeleydb) 
>>
>>#(mandatory 
>>
>>>for qmaster installation) SPOOLING_METHOD="berkeleydb"
>>>
>>># Name of the Server, where the Spooling DB is running on # 
>>
>>if spooling 
>>
>>>methode is berkeleydb, it must be "none", when # using no spooling 
>>>server and it must containe the servername # if a server should be 
>>>used. In case of "classic" spooling, # can be left out 
>>>DB_SPOOLING_SERVER="none"
>>>
>>># The dir, where the DB spools
>>># If berkeley db spooling is used, it must contain the path to # the 
>>>spooling db. Please enter the full path. (eg. /tmp/data/spooldb) # 
>>>Remember, this directory must be local on the qmaster host 
>>
>>or on the # 
>>
>>>Berkeley DB Server host. No NSF mount, please 
>>
>>DB_SPOOLING_DIR="spooldb"
>>
>>># A List of Host which should become admin hosts # If you do 
>>
>>not enter 
>>
>>>any host here, you have to add all of your hosts # by hand, 
>>
>>after the 
>>
>>>installation. The autoinstallation works without # any entry 
>>>ADMIN_HOST_LIST="headnode node01 node02 node03 node04 node05 node06
>>>node07 node08 node09 node10 node11 node12 node13 node14 node15 node16
>>>node17 node18 node19 node20"
>>>
>>># A List of Host which should become submit hosts # If you 
>>
>>do not enter 
>>
>>>any host here, you have to add all of your hosts # by hand, 
>>
>>after the 
>>
>>>installation. The autoinstallation works without # any entry 
>>>SUBMIT_HOST_LIST="headnode"
>>>
>>>
>>># A List of Host which should become exec hosts # If you do 
>>
>>not enter 
>>
>>>any host here, you have to add all of your hosts # by hand, 
>>
>>after the 
>>
>>>installation. The autoinstallation works without # any entry # 
>>>(mandatory for execution host installation)
>>>EXEC_HOST_LIST="node01 node02 node03 node04 node05 node06 
>>
>>node07 node08
>>
>>>node09
>>>node10 node11 node12 node13 node14 node15 node16 node17 
>>
>>node18 node19 
>>
>>>node20"
>>>
>>># The dir, where the execd spools (local configuration) # If 
>>
>>you want 
>>
>>>configure your execution daemons to spool in # a local 
>>
>>directory, you 
>>
>>>have to enter this directory here.
>>># If you do not want to configure a local execution host spool 
>>>directory # please leave this empty 
>>>EXECD_SPOOL_DIR_LOCAL="/opt/sge/default/spool"
>>>
>>># If true, the domainnames will be ignored, during the hostname 
>>>resolving # if false, the fully qualified domain name will 
>>
>>be used for 
>>
>>>name resolving HOSTNAME_RESOLVING="true"
>>>
>>># Shell, which should be used for remote installation 
>>
>>(rsh/ssh) # This 
>>
>>>is only supported, if your hosts and rshd/sshd is 
>>
>>configured, # not to 
>>
>>>ask for a password, or promting any message.
>>>SHELL_NAME="ssh"
>>>
>>># Enter your default domain, if you are using /etc/hosts or NIS 
>>>configuration DEFAULT_DOMAIN="none"
>>>
>>># If a job stops, fails, finnish, you can send a mail to this adress 
>>>ADMIN_MAIL="none"
>>>
>>># If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will 
>>
>>be added, 
>>
>>># to start automatically during boottime ADD_TO_RC="false"
>>>
>>>#If this is "true" the file permissions of executables will be set to
>>>755
>>>#and of ordenary file to 644.  
>>>SET_FILE_PERMS="true"
>>>
>>># This option is not implemented, yet.
>>># When a exechost should be uninstalled, the running jobs will be 
>>>rescheduled RESCHEDULE_JOBS="wait"
>>>
>>># Enter a one of the three distributed scheduler tuning 
>>
>>configuration 
>>
>>>sets # (1=normal, 2=high, 3=max) SCHEDD_CONF="1"
>>>
>>># The name of the shadow host. This host must have read/write 
>>>permission # to the qmaster spool directory # If you want to setup a 
>>>shadow host, you must enter the servername # (mandatory for 
>>
>>shadowhost 
>>
>>>installation) SHADOW_HOST="hostname"
>>>
>>># Remove this execution hosts in automatic mode # (mandatory for 
>>>unistallation of execution hosts)
>>>EXEC_HOST_LIST_RM="host1 host2 host3 host4"
>>>
>>># This option is used for startup script removing. 
>>># If true, all rc startup scripts will be removed during # automatic 
>>>deinstallation. If false, the scripts won't # be touched.
>>># (mandatory for unistallation of execution/qmaster hosts) 
>>>REMOVE_RC="false"
>>>---
>>>
>>>I don't think they appear as NFS mounts, but you made a good 
>>
>>point that 
>>
>>>perhaps Berkeley DB does not like this special method of sharing 
>>>filesystems, anyways, here's a df of what the directory 
>>
>>looks like on 
>>
>>>one of the nodes:
>>>
>>>-bash-2.05b# df -h
>>>Filesystem             size   used  avail capacity  Mounted on
>>>/                      6.9G   3.9G   2.9G    58%    /
>>>/dev                   6.9G   3.9G   2.9G    58%    /dev
>>>/export/home            26G    27M    26G     1%    /export/home
>>>/lib                   6.9G   3.9G   2.9G    58%    /lib
>>>/opt                   6.9G   3.9G   2.9G    58%    /opt
>>>
>>>/opt is the directory being shared out.
>>>
>>>I can try not to use berkeleydb and see if that works.
>>>
>>>Thanks,
>>>
>>>Bernard
>>>
>>> 
>>>
>>>
>>>>-----Original Message-----
>>>>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM]
>>>>Sent: Friday, January 14, 2005 11:30
>>>>To: users at gridengine.sunsource.net
>>>>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 
>>>>zones
>>>>
>>>>Bernhard,
>>>>
>>>>hm there is something going wrong with berkeley db init.
>>>>It looks like a corrupted command.
>>>>
>>>>One info for you, the autoinstall does not delete any existing 
>>>>directories, this is to prevent a unwanted deletion of 
>>
>>system data or 
>>
>>>>databases of other clusters.
>>>>After a failed autoinstallation the SGE_CELL directory and the 
>>>>Berkeley DB spooling directory are still there. When the 
>>>>autoinstallation finds one of these dirs, it breaks.
>>>>But the error log doesn't look like this.
>>>>
>>>>Could you send me the install config? (/tmp/gsc_template.conf )
>>>>
>>>>Is it right, that using zone, the local directories are 
>>
>>looking like 
>>
>>>>nfs mounted?
>>>>Please consider this, using the berkely db. Berkely db is 
>>
>>not running 
>>
>>>>on an nfs mount.
>>>>
>>>>Regards,
>>>>Marco
>>>>
>>>>Bernard Li wrote:
>>>>
>>>>   
>>>>
>>>>
>>>>>In case the attached log didn't show up, I'm pasting it inline...
>>>>>
>>>>>---
>>>>>Starting qmaster installation!
>>>>>Reading configuration from file /tmp/gsc_template.conf
>>>>>
>>>>>
>>>>>
>>>>>Your $SGE_ROOT directory: /opt/sge
>>>>>
>>>>>Using SGE_QMASTER_PORT >536<.
>>>>>
>>>>>Using SGE_EXECD_PORT >537<.
>>>>>
>>>>>Using >default< as CELL_NAME.
>>>>>Using >/opt/sge/default/spool/qmaster< as QMASTER_SPOOL_DIR.
>>>>>
>>>>>
>>>>>Using >true< as IGNORE_FQDN_DEFAULT.
>>>>>If it's >true<, the domainname will be ignored.
>>>>>
>>>>>Making directories
>>>>>
>>>>>Setting spooling method to dynamic
>>>>>
>>>>>Dumping bootstrapping information
>>>>>Initializing spooling database
>>>>>
>>>>>
>>>>>Using >20000-20100< as gid range.
>>>>>Using >/opt/sge/common/default/spool< as EXECD_SPOOL_DIR.
>>>>>Using >none< as ADMIN_MAIL.
>>>>>Reading in complex attributes.
>>>>>Adding default parallel environments (PE) Reading in parallel
>>>>>environments:
>>>>>	PE "make".
>>>>>Reading in usersets:
>>>>>	Userset "defaultdepartment".
>>>>>	Userset "deadlineusers".
>>>>>usage:
>>>>>./utilbin/sol-x86/spooldefaults command
>>>>>
>>>>>create default entries during installation process 
>>
>>following are the 
>>
>>>>>valid commands:
>>>>>test                          test the spooling framework
>>>>>adminhosts <template_dir>     create admin hosts
>>>>>calendars <template_dir>      create calendars
>>>>>ckpts <template_dir>          create checkpoint environments
>>>>>complexes <template_dir>      create complexes
>>>>>configuration <template>      create the global configuration
>>>>>cqueues <template_dir>        create cluster queues
>>>>>exechosts <template_dir>      create execution hosts
>>>>>local_conf <template> <name>  create a local configuration 
>>
>>managers 
>>
>>>>><mgr1> [<mgr2> ...]  create managers
>>>>>operators <op1> [<op2> ...]   create operators
>>>>>pes <template_dir>            create parallel environments
>>>>>projects <template_dir>       create projects
>>>>>sharetree <template>          create sharetree
>>>>>submithosts <template_dir>    create submit hosts
>>>>>users <template_dir>          create users
>>>>>usersets <template_dir>       create usersets
>>>>>
>>>>>Command failed: ./utilbin/sol-x86/spooldefaults managers
>>>>>
>>>>>Probably a permission problem. Please check file access 
>>
>>permissions.
>>
>>>>>Check read/write permission. Check if SGE daemons are running.
>>>>>
>>>>>Command failed: managers./utilbin/sol-x86/spooldefaults
>>>>>Probably a permission problem. Please check file access 
>>
>>permissions.
>>
>>>>>Check read/write permission. Check if SGE daemons are running.
>>>>>---
>>>>>
>>>>>Cheers,
>>>>>
>>>>>Bernard
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Bernard Li [mailto:bli at bcgsc.ca]
>>>>>>Sent: Friday, January 14, 2005 10:56
>>>>>>To: users at gridengine.sunsource.net
>>>>>>Subject: RE: [GE users] Autoinstallation of SGE 6.0u1 on 
>>
>>Solaris 10 
>>
>>>>>>zones
>>>>>>
>>>>>>Hi Marco:
>>>>>>
>>>>>>Thanks - okay this is clearer now.  I have chown'ed the
>>>>>>       
>>>>>>
>>>>
>>>>directory of
>>>>   
>>>>
>>>>
>>>>>>SGE_ROOT to sgeadmin and ran the autoinstallation, however
>>>>>>       
>>>>>>
>>>>
>>>>I got the
>>>>   
>>>>
>>>>
>>>>>>attached error log.
>>>>>>
>>>>>>I have already chown and chgrp recursively the SGE_ROOT
>>>>>>       
>>>>>>
>>>>
>>>>directory, so
>>>>   
>>>>
>>>>
>>>>>>I don't understand why it could be a permission issue.
>>>>>>
>>>>>>Are any of these bugs going to be fixed in 6.0u2?  I 
>>
>>think getting 
>>
>>>>>>autoinstallation working seamlessly is very important as 
>>
>>this makes 
>>
>>>>>>deployment of SGE on large clusters much easier.
>>>>>>
>>>>>>Thanks,
>>>>>>
>>>>>>Bernard
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM]
>>>>>>>Sent: Friday, January 14, 2005 0:24
>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on
>>>>>>>         
>>>>>>>
>>>>
>>>>Solaris 10
>>>>   
>>>>
>>>>
>>>>>>>zones
>>>>>>>
>>>>>>>Bernhard,
>>>>>>>
>>>>>>>no there is still a bug in the autoinstall.
>>>>>>>The auto procedure uses the owner of the SGE_ROOT as admin user.
>>>>>>>I know this is bad, buf if your  SGE_ROOT dir is owned by
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>
>>>>>>"sgeadmin" 
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>
>>>>>>>user, adminuser will be "sgeadmin".
>>>>>>>
>>>>>>>Regards,
>>>>>>>Marco
>>>>>>>
>>>>>>>Bernard Li wrote:
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>
>>>>>>>>Hello all:
>>>>>>>>
>>>>>>>>Trying to set up SGE 6.0u1 with Solaris 10 
>>
>>containers/zones using 
>>
>>>>>>>>autoinstallation.
>>>>>>>>
>>>>>>>>In the manual it says that 'You must change the 
>>
>>ownership of the 
>>
>>>>>>>>sge-root directory to belong to your existing
>>>>>>>>           
>>>>>>>>
>>>>
>>>>administrative user'.
>>>>   
>>>>
>>>>
>>>>>>>>Let's say I want the administrative user to be 'sgeadmin' -
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>
>>>>>>>does that
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>
>>>>>>>>mean I need to install qmaster manually as root, set
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>
>>>>>>sgeadmin as the
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>
>>>>>>>>administrative user then continue with autoinstallation 
>>
>>using the 
>>
>>>>>>>>sgeadmin user account?
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>
>>>>>>>>Bernard
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>
>>>>>>>-----------------------------------------------------------
>>>>>>>         
>>>>>>>
>>>>
>>>>----------
>>>>   
>>>>
>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>
>>>>>>>>To unsubscribe, e-mail: 
>>
>>users-unsubscribe at gridengine.sunsource.net
>>
>>>>>>>>For additional commands, e-mail: 
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>
>>>>>>users-help at gridengine.sunsource.net
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>
>>>>>>------------------------------------------------------------
>>>>>>       
>>>>>>
>>>>
>>>>---------
>>>>   
>>>>
>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>
>>>>>>>To unsubscribe, e-mail: 
>>
>>users-unsubscribe at gridengine.sunsource.net
>>
>>>>>>>For additional commands, e-mail: 
>>>>>>>         
>>>>>>>
>>>>
>>>>users-help at gridengine.sunsource.net
>>>>   
>>>>
>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>
>>>>>-----------------------------------------------------------
>>
>>----------
>>
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: 
>>
>>users-help at gridengine.sunsource.net
>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>
>>>>------------------------------------------------------------
>>
>>---------
>>
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>   
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> 
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

-- 

Marco Donauer            Tel: +49 941 3075-211  (x60211)
Software Engineer        Fax: +49 941 3075-222  (x60222)
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7    mailto:marco.donauer at sun.com
D-93049 Regensburg       http://www.sun.com/gridware


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list