[GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 zones

Marco Donauer Marco.Donauer at Sun.COM
Fri Jan 14 19:50:59 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Bernhard,

Your right it doesn't look like a directory problem. It should be an 
additional info only.
Your config file is ok. The directory spooldb is on a local file system? 
Is it created?
I think so because the creation an first init of berkeley db works.
It fails at that point where the managers will be added!

Did you use  precompiled binaries? Is it possible that your package corrupt?
Does the manual installation work?

At the moment I have no further idea.
Could you send me the script debug output?
The set -x shell switch you know?

Regards,
Marco

Bernard Li wrote:

>Hi Marco:
>
>I have always done a fresh autoinstallation (i.e. revert the directory
>to the state before installation), so I don't think the problem is with
>the existence of directories.
>
>The following is the template I've used:
>
>---
>#-------------------------------------------------
># SGE default configuration file
>#-------------------------------------------------
>
># Use always fully qualified pathnames, please
>
># SGE_ROOT Path, this is basic information
>#(mandatory for qmaster and execd installation)
>SGE_ROOT="/opt/sge"
>
># SGE_QMASTER_PORT is used by qmaster for communication
># Please enter the port in this way: 1300
># Please do not this: 1300/tcp
>#(mandatory for qmaster installation)
>SGE_QMASTER_PORT="536"
>
># SGE_EXECD_PORT is used by execd for communication
># Please enter the port in this way: 1300
># Please do not this: 1300/tcp
>#(mandatory for qmaster installation)
>SGE_EXECD_PORT="537"
>
># CELL_NAME, will be a dir in SGE_ROOT, contains the common dir
># Please enter only the name of the cell. No path, please
>#(mandatory for qmaster and execd installation)
>CELL_NAME="default"
>
># The dir, where qmaster spools this parts, which are not spooled by DB
>#(mandatory for qmaster installation)
>QMASTER_SPOOL_DIR="/opt/sge/default/spool/qmaster"
>
># The dir, where the execd spools (active jobs)
># This entry is needed, even if your are going to use
># berkeley db spooling. Only cluster configuration and jobs will
># be spooled in the database. The execution daemon still needs a spool
># directory  
>#(mandatory for qmaster installation)
>EXECD_SPOOL_DIR="/opt/sge/common/default/spool"
>
># For monitoring and accounting of jobs, every job will get
># unique GID. So you have to enter a free GID Range, which
># is assigned to each job running on a machine.
># If you want to run 100 Jobs at the same time on one host you
># have to enter a GID-Range like that: 16000-16100
>#(mandatory for qmaster installation)
>GID_RANGE="20000-20100"
>
># If SGE is compiled with -spool-dynamic, you have to enter here, which
># spooling method should be used. (classic or berkeleydb)
>#(mandatory for qmaster installation)
>SPOOLING_METHOD="berkeleydb"
>
># Name of the Server, where the Spooling DB is running on
># if spooling methode is berkeleydb, it must be "none", when
># using no spooling server and it must containe the servername
># if a server should be used. In case of "classic" spooling,
># can be left out
>DB_SPOOLING_SERVER="none"
>
># The dir, where the DB spools
># If berkeley db spooling is used, it must contain the path to
># the spooling db. Please enter the full path. (eg. /tmp/data/spooldb)
># Remember, this directory must be local on the qmaster host or on the
># Berkeley DB Server host. No NSF mount, please
>DB_SPOOLING_DIR="spooldb"
>
># A List of Host which should become admin hosts
># If you do not enter any host here, you have to add all of your hosts
># by hand, after the installation. The autoinstallation works without
># any entry
>ADMIN_HOST_LIST="headnode node01 node02 node03 node04 node05 node06
>node07 node08 node09 node10 node11 node12 node13 node14 node15 node16
>node17 node18 node19 node20"
>
># A List of Host which should become submit hosts
># If you do not enter any host here, you have to add all of your hosts
># by hand, after the installation. The autoinstallation works without
># any entry
>SUBMIT_HOST_LIST="headnode"
>
>
># A List of Host which should become exec hosts
># If you do not enter any host here, you have to add all of your hosts
># by hand, after the installation. The autoinstallation works without
># any entry
># (mandatory for execution host installation)
>EXEC_HOST_LIST="node01 node02 node03 node04 node05 node06 node07 node08
>node09
>node10 node11 node12 node13 node14 node15 node16 node17 node18 node19
>node20"
>
># The dir, where the execd spools (local configuration)
># If you want configure your execution daemons to spool in
># a local directory, you have to enter this directory here.
># If you do not want to configure a local execution host spool directory
># please leave this empty
>EXECD_SPOOL_DIR_LOCAL="/opt/sge/default/spool"
>
># If true, the domainnames will be ignored, during the hostname
>resolving
># if false, the fully qualified domain name will be used for name
>resolving
>HOSTNAME_RESOLVING="true"
>
># Shell, which should be used for remote installation (rsh/ssh)
># This is only supported, if your hosts and rshd/sshd is configured,
># not to ask for a password, or promting any message.
>SHELL_NAME="ssh"
>
># Enter your default domain, if you are using /etc/hosts or NIS
>configuration
>DEFAULT_DOMAIN="none"
>
># If a job stops, fails, finnish, you can send a mail to this adress
>ADMIN_MAIL="none"
>
># If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added,
># to start automatically during boottime
>ADD_TO_RC="false"
>
>#If this is "true" the file permissions of executables will be set to
>755
>#and of ordenary file to 644.  
>SET_FILE_PERMS="true"
>
># This option is not implemented, yet.
># When a exechost should be uninstalled, the running jobs will be
>rescheduled
>RESCHEDULE_JOBS="wait"
>
># Enter a one of the three distributed scheduler tuning configuration
>sets
># (1=normal, 2=high, 3=max)
>SCHEDD_CONF="1"
>
># The name of the shadow host. This host must have read/write permission
># to the qmaster spool directory
># If you want to setup a shadow host, you must enter the servername
># (mandatory for shadowhost installation)
>SHADOW_HOST="hostname"
>
># Remove this execution hosts in automatic mode
># (mandatory for unistallation of execution hosts)
>EXEC_HOST_LIST_RM="host1 host2 host3 host4"
>
># This option is used for startup script removing. 
># If true, all rc startup scripts will be removed during
># automatic deinstallation. If false, the scripts won't
># be touched.
># (mandatory for unistallation of execution/qmaster hosts)
>REMOVE_RC="false"
>---
>
>I don't think they appear as NFS mounts, but you made a good point that
>perhaps Berkeley DB does not like this special method of sharing
>filesystems, anyways, here's a df of what the directory looks like on
>one of the nodes:
>
>-bash-2.05b# df -h
>Filesystem             size   used  avail capacity  Mounted on
>/                      6.9G   3.9G   2.9G    58%    /
>/dev                   6.9G   3.9G   2.9G    58%    /dev
>/export/home            26G    27M    26G     1%    /export/home
>/lib                   6.9G   3.9G   2.9G    58%    /lib
>/opt                   6.9G   3.9G   2.9G    58%    /opt
>
>/opt is the directory being shared out.
>
>I can try not to use berkeleydb and see if that works.
>
>Thanks,
>
>Bernard 
>
>  
>
>>-----Original Message-----
>>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM] 
>>Sent: Friday, January 14, 2005 11:30
>>To: users at gridengine.sunsource.net
>>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on 
>>Solaris 10 zones
>>
>>Bernhard,
>>
>>hm there is something going wrong with berkeley db init.
>>It looks like a corrupted command.
>>
>>One info for you, the autoinstall does not delete any 
>>existing directories, this is to prevent a unwanted deletion 
>>of system data or databases of other clusters.
>>After a failed autoinstallation the SGE_CELL directory and 
>>the Berkeley DB spooling directory are still there. When the 
>>autoinstallation finds one of these dirs, it breaks.
>>But the error log doesn't look like this.
>>
>>Could you send me the install config? (/tmp/gsc_template.conf )
>>
>>Is it right, that using zone, the local directories are 
>>looking like nfs mounted?
>>Please consider this, using the berkely db. Berkely db is not 
>>running on an nfs mount.
>>
>>Regards,
>>Marco
>>
>>Bernard Li wrote:
>>
>>    
>>
>>>In case the attached log didn't show up, I'm pasting it inline...
>>>
>>>---
>>>Starting qmaster installation!
>>>Reading configuration from file /tmp/gsc_template.conf
>>>
>>>
>>>
>>>Your $SGE_ROOT directory: /opt/sge
>>>
>>>Using SGE_QMASTER_PORT >536<.
>>>
>>>Using SGE_EXECD_PORT >537<.
>>>
>>>Using >default< as CELL_NAME.
>>>Using >/opt/sge/default/spool/qmaster< as QMASTER_SPOOL_DIR.
>>>
>>>
>>>Using >true< as IGNORE_FQDN_DEFAULT.
>>>If it's >true<, the domainname will be ignored.
>>>
>>>Making directories
>>>
>>>Setting spooling method to dynamic
>>>
>>>Dumping bootstrapping information
>>>Initializing spooling database
>>>
>>>
>>>Using >20000-20100< as gid range.
>>>Using >/opt/sge/common/default/spool< as EXECD_SPOOL_DIR.
>>>Using >none< as ADMIN_MAIL.
>>>Reading in complex attributes.
>>>Adding default parallel environments (PE) Reading in parallel 
>>>environments:
>>>	PE "make".
>>>Reading in usersets:
>>>	Userset "defaultdepartment".
>>>	Userset "deadlineusers".
>>>usage:
>>>./utilbin/sol-x86/spooldefaults command
>>>
>>>create default entries during installation process following are the 
>>>valid commands:
>>>test                          test the spooling framework
>>>adminhosts <template_dir>     create admin hosts
>>>calendars <template_dir>      create calendars
>>>ckpts <template_dir>          create checkpoint environments
>>>complexes <template_dir>      create complexes
>>>configuration <template>      create the global configuration
>>>cqueues <template_dir>        create cluster queues
>>>exechosts <template_dir>      create execution hosts
>>>local_conf <template> <name>  create a local configuration managers 
>>><mgr1> [<mgr2> ...]  create managers
>>>operators <op1> [<op2> ...]   create operators
>>>pes <template_dir>            create parallel environments
>>>projects <template_dir>       create projects
>>>sharetree <template>          create sharetree
>>>submithosts <template_dir>    create submit hosts
>>>users <template_dir>          create users
>>>usersets <template_dir>       create usersets
>>>
>>>Command failed: ./utilbin/sol-x86/spooldefaults managers
>>>
>>>Probably a permission problem. Please check file access permissions.
>>>Check read/write permission. Check if SGE daemons are running.
>>>
>>>Command failed: managers./utilbin/sol-x86/spooldefaults
>>>Probably a permission problem. Please check file access permissions.
>>>Check read/write permission. Check if SGE daemons are running.
>>>---
>>>
>>>Cheers,
>>>
>>>Bernard
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Bernard Li [mailto:bli at bcgsc.ca]
>>>>Sent: Friday, January 14, 2005 10:56
>>>>To: users at gridengine.sunsource.net
>>>>Subject: RE: [GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 
>>>>zones
>>>>
>>>>Hi Marco:
>>>>
>>>>Thanks - okay this is clearer now.  I have chown'ed the 
>>>>        
>>>>
>>directory of 
>>    
>>
>>>>SGE_ROOT to sgeadmin and ran the autoinstallation, however 
>>>>        
>>>>
>>I got the 
>>    
>>
>>>>attached error log.
>>>>
>>>>I have already chown and chgrp recursively the SGE_ROOT 
>>>>        
>>>>
>>directory, so 
>>    
>>
>>>>I don't understand why it could be a permission issue.
>>>>
>>>>Are any of these bugs going to be fixed in 6.0u2?  I think getting 
>>>>autoinstallation working seamlessly is very important as this makes 
>>>>deployment of SGE on large clusters much easier.
>>>>
>>>>Thanks,
>>>>
>>>>Bernard
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM]
>>>>>Sent: Friday, January 14, 2005 0:24
>>>>>To: users at gridengine.sunsource.net
>>>>>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on 
>>>>>          
>>>>>
>>Solaris 10 
>>    
>>
>>>>>zones
>>>>>
>>>>>Bernhard,
>>>>>
>>>>>no there is still a bug in the autoinstall.
>>>>>The auto procedure uses the owner of the SGE_ROOT as admin user.
>>>>>I know this is bad, buf if your  SGE_ROOT dir is owned by
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>"sgeadmin" 
>>>>   
>>>>
>>>>        
>>>>
>>>>>user, adminuser will be "sgeadmin".
>>>>>
>>>>>Regards,
>>>>>Marco
>>>>>
>>>>>Bernard Li wrote:
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Hello all:
>>>>>>
>>>>>>Trying to set up SGE 6.0u1 with Solaris 10 containers/zones using 
>>>>>>autoinstallation.
>>>>>>
>>>>>>In the manual it says that 'You must change the ownership of the 
>>>>>>sge-root directory to belong to your existing 
>>>>>>            
>>>>>>
>>administrative user'.
>>    
>>
>>>>>>Let's say I want the administrative user to be 'sgeadmin' -
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>does that
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>mean I need to install qmaster manually as root, set
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>sgeadmin as the
>>>>   
>>>>
>>>>        
>>>>
>>>>>>administrative user then continue with autoinstallation using the 
>>>>>>sgeadmin user account?
>>>>>>
>>>>>>Thanks,
>>>>>>
>>>>>>Bernard
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>-----------------------------------------------------------
>>>>>          
>>>>>
>>----------
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: 
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>users-help at gridengine.sunsource.net
>>>>   
>>>>
>>>>        
>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>------------------------------------------------------------
>>>>        
>>>>
>>---------
>>    
>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: 
>>>>>          
>>>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list