[GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 zones

Bernard Li bli at bcgsc.ca
Sat Jan 15 01:13:35 GMT 2005


Hi Marco:

Attached is the log file with debug.

spooldb should be seen as a 'local' file system on the node, it is also
shared out from /opt.

Yes I used the pre-compiled binaries - I was able to install SGE
manually as root, but I never have any success if I'm the 'sgeadmin'
user.

If I manually install qmaster as root, chroot SGE_ROOT, do you think I
can install the execution hosts automatically by running:

./inst_sge -x -auto <conf>

?

Thanks,

Bernard
 

> -----Original Message-----
> From: Marco Donauer [mailto:Marco.Donauer at Sun.COM] 
> Sent: Friday, January 14, 2005 11:51
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on 
> Solaris 10 zones
> 
> Bernhard,
> 
> Your right it doesn't look like a directory problem. It 
> should be an additional info only.
> Your config file is ok. The directory spooldb is on a local 
> file system? 
> Is it created?
> I think so because the creation an first init of berkeley db works.
> It fails at that point where the managers will be added!
> 
> Did you use  precompiled binaries? Is it possible that your 
> package corrupt?
> Does the manual installation work?
> 
> At the moment I have no further idea.
> Could you send me the script debug output?
> The set -x shell switch you know?
> 
> Regards,
> Marco
> 
> Bernard Li wrote:
> 
> >Hi Marco:
> >
> >I have always done a fresh autoinstallation (i.e. revert the 
> directory 
> >to the state before installation), so I don't think the 
> problem is with 
> >the existence of directories.
> >
> >The following is the template I've used:
> >
> >---
> >#-------------------------------------------------
> ># SGE default configuration file
> >#-------------------------------------------------
> >
> ># Use always fully qualified pathnames, please
> >
> ># SGE_ROOT Path, this is basic information #(mandatory for 
> qmaster and 
> >execd installation) SGE_ROOT="/opt/sge"
> >
> ># SGE_QMASTER_PORT is used by qmaster for communication # 
> Please enter 
> >the port in this way: 1300 # Please do not this: 1300/tcp 
> #(mandatory 
> >for qmaster installation) SGE_QMASTER_PORT="536"
> >
> ># SGE_EXECD_PORT is used by execd for communication # Please 
> enter the 
> >port in this way: 1300 # Please do not this: 1300/tcp 
> #(mandatory for 
> >qmaster installation) SGE_EXECD_PORT="537"
> >
> ># CELL_NAME, will be a dir in SGE_ROOT, contains the common dir # 
> >Please enter only the name of the cell. No path, please 
> #(mandatory for 
> >qmaster and execd installation) CELL_NAME="default"
> >
> ># The dir, where qmaster spools this parts, which are not 
> spooled by DB 
> >#(mandatory for qmaster installation) 
> >QMASTER_SPOOL_DIR="/opt/sge/default/spool/qmaster"
> >
> ># The dir, where the execd spools (active jobs) # This entry 
> is needed, 
> >even if your are going to use # berkeley db spooling. Only cluster 
> >configuration and jobs will # be spooled in the database. 
> The execution 
> >daemon still needs a spool # directory #(mandatory for qmaster 
> >installation) EXECD_SPOOL_DIR="/opt/sge/common/default/spool"
> >
> ># For monitoring and accounting of jobs, every job will get # unique 
> >GID. So you have to enter a free GID Range, which # is 
> assigned to each 
> >job running on a machine.
> ># If you want to run 100 Jobs at the same time on one host 
> you # have 
> >to enter a GID-Range like that: 16000-16100 #(mandatory for qmaster 
> >installation) GID_RANGE="20000-20100"
> >
> ># If SGE is compiled with -spool-dynamic, you have to enter 
> here, which 
> ># spooling method should be used. (classic or berkeleydb) 
> #(mandatory 
> >for qmaster installation) SPOOLING_METHOD="berkeleydb"
> >
> ># Name of the Server, where the Spooling DB is running on # 
> if spooling 
> >methode is berkeleydb, it must be "none", when # using no spooling 
> >server and it must containe the servername # if a server should be 
> >used. In case of "classic" spooling, # can be left out 
> >DB_SPOOLING_SERVER="none"
> >
> ># The dir, where the DB spools
> ># If berkeley db spooling is used, it must contain the path to # the 
> >spooling db. Please enter the full path. (eg. /tmp/data/spooldb) # 
> >Remember, this directory must be local on the qmaster host 
> or on the # 
> >Berkeley DB Server host. No NSF mount, please 
> DB_SPOOLING_DIR="spooldb"
> >
> ># A List of Host which should become admin hosts # If you do 
> not enter 
> >any host here, you have to add all of your hosts # by hand, 
> after the 
> >installation. The autoinstallation works without # any entry 
> >ADMIN_HOST_LIST="headnode node01 node02 node03 node04 node05 node06
> >node07 node08 node09 node10 node11 node12 node13 node14 node15 node16
> >node17 node18 node19 node20"
> >
> ># A List of Host which should become submit hosts # If you 
> do not enter 
> >any host here, you have to add all of your hosts # by hand, 
> after the 
> >installation. The autoinstallation works without # any entry 
> >SUBMIT_HOST_LIST="headnode"
> >
> >
> ># A List of Host which should become exec hosts # If you do 
> not enter 
> >any host here, you have to add all of your hosts # by hand, 
> after the 
> >installation. The autoinstallation works without # any entry # 
> >(mandatory for execution host installation)
> >EXEC_HOST_LIST="node01 node02 node03 node04 node05 node06 
> node07 node08
> >node09
> >node10 node11 node12 node13 node14 node15 node16 node17 
> node18 node19 
> >node20"
> >
> ># The dir, where the execd spools (local configuration) # If 
> you want 
> >configure your execution daemons to spool in # a local 
> directory, you 
> >have to enter this directory here.
> ># If you do not want to configure a local execution host spool 
> >directory # please leave this empty 
> >EXECD_SPOOL_DIR_LOCAL="/opt/sge/default/spool"
> >
> ># If true, the domainnames will be ignored, during the hostname 
> >resolving # if false, the fully qualified domain name will 
> be used for 
> >name resolving HOSTNAME_RESOLVING="true"
> >
> ># Shell, which should be used for remote installation 
> (rsh/ssh) # This 
> >is only supported, if your hosts and rshd/sshd is 
> configured, # not to 
> >ask for a password, or promting any message.
> >SHELL_NAME="ssh"
> >
> ># Enter your default domain, if you are using /etc/hosts or NIS 
> >configuration DEFAULT_DOMAIN="none"
> >
> ># If a job stops, fails, finnish, you can send a mail to this adress 
> >ADMIN_MAIL="none"
> >
> ># If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will 
> be added, 
> ># to start automatically during boottime ADD_TO_RC="false"
> >
> >#If this is "true" the file permissions of executables will be set to
> >755
> >#and of ordenary file to 644.  
> >SET_FILE_PERMS="true"
> >
> ># This option is not implemented, yet.
> ># When a exechost should be uninstalled, the running jobs will be 
> >rescheduled RESCHEDULE_JOBS="wait"
> >
> ># Enter a one of the three distributed scheduler tuning 
> configuration 
> >sets # (1=normal, 2=high, 3=max) SCHEDD_CONF="1"
> >
> ># The name of the shadow host. This host must have read/write 
> >permission # to the qmaster spool directory # If you want to setup a 
> >shadow host, you must enter the servername # (mandatory for 
> shadowhost 
> >installation) SHADOW_HOST="hostname"
> >
> ># Remove this execution hosts in automatic mode # (mandatory for 
> >unistallation of execution hosts)
> >EXEC_HOST_LIST_RM="host1 host2 host3 host4"
> >
> ># This option is used for startup script removing. 
> ># If true, all rc startup scripts will be removed during # automatic 
> >deinstallation. If false, the scripts won't # be touched.
> ># (mandatory for unistallation of execution/qmaster hosts) 
> >REMOVE_RC="false"
> >---
> >
> >I don't think they appear as NFS mounts, but you made a good 
> point that 
> >perhaps Berkeley DB does not like this special method of sharing 
> >filesystems, anyways, here's a df of what the directory 
> looks like on 
> >one of the nodes:
> >
> >-bash-2.05b# df -h
> >Filesystem             size   used  avail capacity  Mounted on
> >/                      6.9G   3.9G   2.9G    58%    /
> >/dev                   6.9G   3.9G   2.9G    58%    /dev
> >/export/home            26G    27M    26G     1%    /export/home
> >/lib                   6.9G   3.9G   2.9G    58%    /lib
> >/opt                   6.9G   3.9G   2.9G    58%    /opt
> >
> >/opt is the directory being shared out.
> >
> >I can try not to use berkeleydb and see if that works.
> >
> >Thanks,
> >
> >Bernard
> >
> >  
> >
> >>-----Original Message-----
> >>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM]
> >>Sent: Friday, January 14, 2005 11:30
> >>To: users at gridengine.sunsource.net
> >>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on Solaris 10 
> >>zones
> >>
> >>Bernhard,
> >>
> >>hm there is something going wrong with berkeley db init.
> >>It looks like a corrupted command.
> >>
> >>One info for you, the autoinstall does not delete any existing 
> >>directories, this is to prevent a unwanted deletion of 
> system data or 
> >>databases of other clusters.
> >>After a failed autoinstallation the SGE_CELL directory and the 
> >>Berkeley DB spooling directory are still there. When the 
> >>autoinstallation finds one of these dirs, it breaks.
> >>But the error log doesn't look like this.
> >>
> >>Could you send me the install config? (/tmp/gsc_template.conf )
> >>
> >>Is it right, that using zone, the local directories are 
> looking like 
> >>nfs mounted?
> >>Please consider this, using the berkely db. Berkely db is 
> not running 
> >>on an nfs mount.
> >>
> >>Regards,
> >>Marco
> >>
> >>Bernard Li wrote:
> >>
> >>    
> >>
> >>>In case the attached log didn't show up, I'm pasting it inline...
> >>>
> >>>---
> >>>Starting qmaster installation!
> >>>Reading configuration from file /tmp/gsc_template.conf
> >>>
> >>>
> >>>
> >>>Your $SGE_ROOT directory: /opt/sge
> >>>
> >>>Using SGE_QMASTER_PORT >536<.
> >>>
> >>>Using SGE_EXECD_PORT >537<.
> >>>
> >>>Using >default< as CELL_NAME.
> >>>Using >/opt/sge/default/spool/qmaster< as QMASTER_SPOOL_DIR.
> >>>
> >>>
> >>>Using >true< as IGNORE_FQDN_DEFAULT.
> >>>If it's >true<, the domainname will be ignored.
> >>>
> >>>Making directories
> >>>
> >>>Setting spooling method to dynamic
> >>>
> >>>Dumping bootstrapping information
> >>>Initializing spooling database
> >>>
> >>>
> >>>Using >20000-20100< as gid range.
> >>>Using >/opt/sge/common/default/spool< as EXECD_SPOOL_DIR.
> >>>Using >none< as ADMIN_MAIL.
> >>>Reading in complex attributes.
> >>>Adding default parallel environments (PE) Reading in parallel
> >>>environments:
> >>>	PE "make".
> >>>Reading in usersets:
> >>>	Userset "defaultdepartment".
> >>>	Userset "deadlineusers".
> >>>usage:
> >>>./utilbin/sol-x86/spooldefaults command
> >>>
> >>>create default entries during installation process 
> following are the 
> >>>valid commands:
> >>>test                          test the spooling framework
> >>>adminhosts <template_dir>     create admin hosts
> >>>calendars <template_dir>      create calendars
> >>>ckpts <template_dir>          create checkpoint environments
> >>>complexes <template_dir>      create complexes
> >>>configuration <template>      create the global configuration
> >>>cqueues <template_dir>        create cluster queues
> >>>exechosts <template_dir>      create execution hosts
> >>>local_conf <template> <name>  create a local configuration 
> managers 
> >>><mgr1> [<mgr2> ...]  create managers
> >>>operators <op1> [<op2> ...]   create operators
> >>>pes <template_dir>            create parallel environments
> >>>projects <template_dir>       create projects
> >>>sharetree <template>          create sharetree
> >>>submithosts <template_dir>    create submit hosts
> >>>users <template_dir>          create users
> >>>usersets <template_dir>       create usersets
> >>>
> >>>Command failed: ./utilbin/sol-x86/spooldefaults managers
> >>>
> >>>Probably a permission problem. Please check file access 
> permissions.
> >>>Check read/write permission. Check if SGE daemons are running.
> >>>
> >>>Command failed: managers./utilbin/sol-x86/spooldefaults
> >>>Probably a permission problem. Please check file access 
> permissions.
> >>>Check read/write permission. Check if SGE daemons are running.
> >>>---
> >>>
> >>>Cheers,
> >>>
> >>>Bernard
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>-----Original Message-----
> >>>>From: Bernard Li [mailto:bli at bcgsc.ca]
> >>>>Sent: Friday, January 14, 2005 10:56
> >>>>To: users at gridengine.sunsource.net
> >>>>Subject: RE: [GE users] Autoinstallation of SGE 6.0u1 on 
> Solaris 10 
> >>>>zones
> >>>>
> >>>>Hi Marco:
> >>>>
> >>>>Thanks - okay this is clearer now.  I have chown'ed the
> >>>>        
> >>>>
> >>directory of
> >>    
> >>
> >>>>SGE_ROOT to sgeadmin and ran the autoinstallation, however
> >>>>        
> >>>>
> >>I got the
> >>    
> >>
> >>>>attached error log.
> >>>>
> >>>>I have already chown and chgrp recursively the SGE_ROOT
> >>>>        
> >>>>
> >>directory, so
> >>    
> >>
> >>>>I don't understand why it could be a permission issue.
> >>>>
> >>>>Are any of these bugs going to be fixed in 6.0u2?  I 
> think getting 
> >>>>autoinstallation working seamlessly is very important as 
> this makes 
> >>>>deployment of SGE on large clusters much easier.
> >>>>
> >>>>Thanks,
> >>>>
> >>>>Bernard
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>-----Original Message-----
> >>>>>From: Marco Donauer [mailto:Marco.Donauer at Sun.COM]
> >>>>>Sent: Friday, January 14, 2005 0:24
> >>>>>To: users at gridengine.sunsource.net
> >>>>>Subject: Re: [GE users] Autoinstallation of SGE 6.0u1 on
> >>>>>          
> >>>>>
> >>Solaris 10
> >>    
> >>
> >>>>>zones
> >>>>>
> >>>>>Bernhard,
> >>>>>
> >>>>>no there is still a bug in the autoinstall.
> >>>>>The auto procedure uses the owner of the SGE_ROOT as admin user.
> >>>>>I know this is bad, buf if your  SGE_ROOT dir is owned by
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>"sgeadmin" 
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>user, adminuser will be "sgeadmin".
> >>>>>
> >>>>>Regards,
> >>>>>Marco
> >>>>>
> >>>>>Bernard Li wrote:
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>Hello all:
> >>>>>>
> >>>>>>Trying to set up SGE 6.0u1 with Solaris 10 
> containers/zones using 
> >>>>>>autoinstallation.
> >>>>>>
> >>>>>>In the manual it says that 'You must change the 
> ownership of the 
> >>>>>>sge-root directory to belong to your existing
> >>>>>>            
> >>>>>>
> >>administrative user'.
> >>    
> >>
> >>>>>>Let's say I want the administrative user to be 'sgeadmin' -
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>does that
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>mean I need to install qmaster manually as root, set
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>sgeadmin as the
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>administrative user then continue with autoinstallation 
> using the 
> >>>>>>sgeadmin user account?
> >>>>>>
> >>>>>>Thanks,
> >>>>>>
> >>>>>>Bernard
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>-----------------------------------------------------------
> >>>>>          
> >>>>>
> >>----------
> >>    
> >>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>To unsubscribe, e-mail: 
> users-unsubscribe at gridengine.sunsource.net
> >>>>>>For additional commands, e-mail: 
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>users-help at gridengine.sunsource.net
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>------------------------------------------------------------
> >>>>        
> >>>>
> >>---------
> >>    
> >>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>To unsubscribe, e-mail: 
> users-unsubscribe at gridengine.sunsource.net
> >>>>>For additional commands, e-mail: 
> >>>>>          
> >>>>>
> >>users-help at gridengine.sunsource.net
> >>    
> >>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>-----------------------------------------------------------
> ----------
> >>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>------------------------------------------------------------
> ---------
> >>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >>    
> >>
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 



    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list