[GE users] Newbie questions

Dan Gruhn Dan.Gruhn at Group-W-Inc.com
Wed Apr 6 16:55:46 BST 2005

Sorry, I replied to the wrong email,  should be to the `serial'
gaussian03 subject.

On Wed, 2005-04-06 at 11:52, Dan Gruhn wrote:

> Dale,
> Here is a fairly short proceedure that could get you going.  It has
> some specifics for our installation, but should be helpful.  Try it on
> an SGE master and a couple of nodes just to start with.  Rueti's
> questions are great to help you decide how you want to do things.
> Dan
> ---------------------------------------
> 1.1. Introduction
> This document describes what a computing grid is, and how the setup
> and configuration of the Sun Grid Engine (SGE) can be accomplished. 
> It is intended as an example of how to set up SGE and is not a
> replacement for the Install, Administration and User Guide's that Sun
> has available.
> 1.1.1. What Is A Computing Grid?
> A computing grid is a collection of hosts, a cluster computer, a
> multiprocessor host, or any combination of the above tied together
> with software (e.g., SGE) to make available to the user the execution
> of jobs on CPUs in the grid.
> The power of grid computing comes from being able harness idle time on
> hosts in the grid as well as making easily available computing
> resources that are not directly at a user's fingertips.  Additionally,
> the queuing capability of the grid software makes background execution
> of a task much easier, even on a single machine.
> Usually a system administrator is needed to configure the grid and
> sets up the job queues used and the rules governing which users can
> use what resources.
> 1.2. Sun Grid Engine (SGE) Documentation
> Sun's documentation can be found by searching the list found at the
> location http://docs.sun.com/app/docs/titl.  Search by title for "N1
> grid engine".  At a minimum the Installation, Administration, and
> User's Guides should be downloaded.
> 1.3. Downloading SGE Software
> SGE software can be downloaded from the Grid Engine website at
> http://gridengine.sunsource.net/.  Under "Resources" on the left side
> of the page, click on Download SGE 6.0 Binaries.  Follow the
> instructions on the page to read and accept the license agreement and
> find the proper files for the O/S and host types to download.  It will
> be in a *.tar.gz format.
> 1.4. Installing SGE
> 1.4.1. Planning
> Several decisions must be made before planning an installation:
> " Decide whether the system of networked computer hosts that run N1
> Grid Engine 6 software (grid engine system) is to be a single cluster
> or a collection of sub-clusters, called cells. Cells allow for
> separate instances of the grid en-gine software but share the binary
> files across those instances.
> " Select the machines that are to be grid engine system hosts.
> Determine the host type of each machine: master host, shadow master
> host, administration host, submit host, execution host, or a
> combination.
> " Ensure that all users of the grid engine system have the same user
> names on all submit and execution hosts.
> " Decide how to organize grid engine software directories. For
> example, they could be organized as a complete tree on each
> workstation, or as cross-mounted directories, or as a partial
> directory tree on some workstations. Also decide where to locate each
> grid engine software installation directory, SGE_ROOT.
> " Decide on the site's queue structure.
> " Determine whether to define network services as an NIS file or as
> local to each workstation in /etc/services.
> Chapter 1 of the Installation Guide is very helpful in this process. 
> It discusses each of the areas mentioned above as well as disk space
> requirements.
> As an example for this document, here is how the above decisions were
> made:
> " A single cluster under the default name "default".
> " It has a single Master Host and no Shadow Master Hosts.  All
> remaining ma-chines are Administration, Submit, and Execution hosts.
> " All users of the grid engine have the same user name on all machines
> through the use of an LDAP server.
> " The grid engine software directories are mountable via NFS (the
> network file system) in the same location by any machine in the
> cluster as /direct/sgeadmin/SunGridEngine.  This directory physically
> resides on the Master Host.
> " The site's queue structure divides the available machines into 3
> classes: 
> o high.q for the highest speed, dual processor hyper-threaded Xeon 3.2
> GHz, 1.5 GByte machines.
> o mid.q for the mid-range speed, single processor, hyper-threaded
> Pen-tium 4 2.8 GHz, 1GByte machines.
> o low.q for the lowest speed, single processor, non-hyper-threaded
> Pen-tium 4 1.8 GHz, 512 MByte machines.
> For special purposes, functionally based queues are created from the
> same pool of machines.  For example, some machines may be used in the
> mysql.q job queue to do database loading while others are used in the
> xeq.q for run-ning the model.  Care must be taken when hosts are in
> multiple queues be-cause SGE could overload a host if overlapping
> queues are used simultane-ously.
> " Network services are defined on each machine via the /etc/services
> file as follows.  Different service numbers must be used, but they
> MUST be the same on all machines.  (Note that the UDP values are not
> used by SGE.)
> # Local services
> sge_command     460/tcp     # Sun Grid Computing command port
> sge_command     460/udp     # Sun Grid Computing command port
> sge_qmaster     461/tcp     # Sun Grid Computing queue master port
> sge_qmaster     461/udp     # Sun Grid Computing queue master port
> sge_execd       462/tcp     # Sun Grid Computing xeq port
> sge_execd       462/udp     # Sun Grid Computing xeq port
> 1.4.2. Needed Packages
> For the GUI of SGE to work, the openmotif21-2.1.30-8 or greater
> package must be installed.
> 1.4.3. Master Host
> To install a Master Host, follow the procedure "How to Install the
> Master Host" in Chapter 2 of the Installation Guide.  
> The following will help while stepping through the install_qmaster
> command interac-tion:
> " Choose a name for the SGE administrator, sgeadmin is a good choice. 
> This pseudo user must be created before beginning the installation.
> " The SGE_ROOT environment variable should be set to
> /home/sgeadmin/SunGridEngine or wherever the software was untarred.
> " Make sure the /etc/services file is updated with the correct local
> services be-fore beginning.
> " Create a file containing the names of the hosts that are to be the
> execution hosts.  This file will be used later in this procedure.
> " Keep the default cell name of "default" unless planning for multiple
> cells.
> " The default spool directory is fine for most setups.
> " When asked file permissions have been set, enter "n" and then enter
> "y" to verifying and setting file permissions.
> " Answer questions about the DNS names of the grid engine system
> hosts.
> " For most installations, choose to use Berkley DB spooling, but
> without a separate spooling server.
> " For a group ID range, 2000-2100 is a reasonable value as long as:
> 1. Established groups do not already extend into this range; and 
> 2. It is not likely to have more the 100 grid engine jobs on the same
> host at the same time.
> If more than 100 grid engine jobs are needed, increase the end of the
> range.
> " The default spool directory for the execution hosts is usually
> fine.  Give a local directory for speed of execution during Execution
> Host configuration.
> " Choose a person who can receive email in cases of problems.  The
> email of the person doing this installation is a good choice.
> " Verify that the configuration parameters look right when asked to do
> so by the installation script.
> " Request that the qmaster/scheduler startup script be run at startup
> time.
> " When asked to specify the execution hosts, use the previously
> created file listing the host names.
> " For a scheduler profile, Normal is acceptable unless the
> installation is creat-ing a very high performance system needing to
> service many users and many different types of jobs.
> Once past this question, the installation process is complete. Several
> screens of in-formation will be displayed before the script exits. The
> commands that are noted in those screens are also documented in
> Chapter 2 of the Installation Guide.
> At the end of the Master Host installation procedure, it talks about
> the settings.csh and settings.sh files.  It will be very helpful to
> change the .profile, .bash_profile, .login or equivalent startup
> script for the root account on each machine that will be part of the
> computing grid to include execution of the appropriate script.  Doing
> so will mean that all of the proper environmental variables will be
> set up when using the root ac-count.  Each user of the grid engine
> should also add the same thing for his or her startup script.
> 1.4.4. Execution Hosts
> As with installing a Master Host, use Chapter 2, "How to Install
> Execution Hosts" from the Installation Guide. The following assist in
> making decisions:
> " The master server MUST be installed before beginning this procedure.
> " The SGE_ROOT environment variable should be set to
> /home/sgeadmin/SunGridEngine or wherever the software was untarred.
> " Make sure the /etc/services file on each Execution Host is updated
> with the SGE local services before beginning the installation.
> " Run the install_execd command as root.
> " Make sure that the SGE_ROOT directory is correct, that it is the
> same as where the SGE tarball was installed, and was used for the
> Administration Host installation.  Alternately, put a local copy of
> the SGE_ROOT directory on each local host to cut down on NFS traffic. 
> Note that separate directories on each execution host necessitate
> additional manual configuration to keep them synchronized as things
> changes.
> " The default cell name of "default" should be sufficient unless there
> are hosts that cannot directly communicate with each other.
> " Use a local spool directory to keep NFS traffic to a minimum. 
> /var/spool/sge is a reasonable choice.  This directory needs to be
> created outside of the pro-ceedure and needs to be owned by sgeadmin.
> " Have execd startup automatically at boot time.
> " Add the default queue instance for the host.  It will help for
> testing before cre-ating queues specific to the needs of the users.
> Once past the last question, the installation process is complete.
> Several screens of information will be displayed before the script
> exits. The commands that are noted in those screens are also
> documented in Chapter 2 of the Installation Guide.
> At the end of the Execution Host installation procedure, it talks
> about the set-tings.csh and settings.sh files.  It will be very
> helpful to change the .profile, .bash_profile, .login or equivalent
> startup script for each user that will use the com-puting grid to
> include execution of the appropriate script.  Doing so will mean that
> all of the proper environmental variables will be set up when using
> the grid.  
> 1.4.5. Administration Hosts
> For each execution host that should also allow administration of the
> computing grid, follow the procedure "Registering Administration
> Hosts" in Chapter 2 of the Installa-tion Guide.  Basically run the
> following command as root:
> qconf -ah admin_host_name[,...]
> 1.4.6. Submit Hosts
> For each execution host that should also allow submission of jobs to
> the computing grid, should follow the procedure "Registering Submit 
> Hosts" in Chapter 2 of the In-stallation Guide.  Basically run the
> following command as root:
> qconf -as submit_host_name[,...]
> 1.4.7. Verify the Installation
> Using Chapter 6 of the Installation Guide, verify that the
> installation is up and run-ning.  Follow the "How To Verify That
> Daemons Are Running On The Master Host", and "How To Verify That The
> Daemons Are Running On The Execution Hosts" pro-cedures.
> Once this all works, try submitting one of the sample scripts
> contained in the $SGE_ROOT/examples/jobs directory.  For example:
> > qsub sge-root/examples/jobs/simple.sh
> Use the qstat command to monitor the job's behavior.
> For more information about submitting and monitoring batch jobs, see
> Submitting Batch Jobs in chapter 3 of the N1 Grid Engine 6 User's
> Guide.
> After the job finishes executing, check the home directory for the
> redirected stdout/stderr files script-name.ejob-id and
> script-name.ojob-id.
> job-id is a consecutive unique integer number assigned to each job.
> In case of problems, see Chapter 8, Fine Tuning, Error Messages, and
> Trouble-shooting, in N1 Grid Engine 6 Administration Guide.
> 1.5. Configuration Users
> 1.5.1. Users
> The simplest way to configure users is to use user based equal sharing
> of re-sources, with automatic registration of users.  To do this,
> configure the cluster global configuration (see sge_conf(5)) with the
> following:
> enforce_user auto
> auto_user_fshare 100
> Using the qmon GUI configuration tool do the following:
> A. click the "Cluster Configuration" button, select "global" in the
> left column and click the "Modify" button.
> B. In the General Settings tab, look for the "Automatic User Defaults
> area at the lower right, set "Functional Shares" to 100.
> C. Just above that, set "Enforce User" to "Auto" ("Enforce Project"
> should be "False"). 
> Next, configure the scheduler configuration (see sched_conf(5)) with
> the following:
> weight_tickets_functional 10000
> Again using qmon, click on "Policy Configuration" button.  In the
> "Ticket Policy" sec-tion, set "Total Functional Tickets" to 10000.
> This will result in having each user automatically registered in the
> computing grid when they submit a job, and each user having equal
> access to grid resources.  That is, if Bob and Wanda both submit jobs,
> barring any other constraints, they will share the currently available
> computing resources equally.  View the currently registered set of
> users by using qmon and clicking on the "User Configuration" button,
> and se-lecting the User  tab.
> 1.5.2. Managers
> Managers can perform any operation the Grid Engine is capable of
> performing.  To configure users who have manager privileges for the
> grid, use qmon and click on the "User Configuration" button. Under the
> Manager tab, enter the names of users who will be managers and click
> the "Add" button.  See chapter 4 of the N1 Grid Engine 6
> Administration Guide for details on what managers can do.
> 1.5.3. Operators
> Operators have more privileges than simple users, but less than
> managers.  Use the Operator tab in the "User Configuration" screen to
> enter operators. .  See chapter 4 of the N1 Grid Engine 6
> Administration Guide for details on what operators can do.
> 1.6. Configuring Job Queues
> Queues are containers for different categories of jobs. Queues provide
> the corre-sponding resources for concurrent execution of multiple jobs
> that belong to the same category.
> In SGE, a queue can be associated with one host or with multiple
> hosts. Because queues can extend across multiple hosts, they are
> called cluster queues. Cluster queues enable managing a cluster of
> execution hosts by means of a single cluster queue configuration and
> name.
> Each host that is associated with a cluster queue receives an instance
> of that cluster queue, which resides on that host. These instances are
> known as queue instances. Within any cluster queue, each queue
> instance can be configured separately. By configuring individual queue
> instances, a heterogeneous cluster of execution hosts can be managed
> by means of a single cluster queue configuration and name. 
> When modifying a cluster queue, all of its queue instances are
> modified simultane-ously. Within a cluster queue, differences in the
> configuration of queue instances can be specified by separately adding
> the associated host and modifying it's attrib-utes. Consequently, a
> typical setup might have only a few cluster queues, and the queue
> instances controlled by those cluster queues remain largely ignored.
> NOTE: The distinction between cluster queues and queue instances is
> important. For example, jobs always run in queue instances, not in
> cluster queues.
> When configuring a cluster queue, any combination of the following
> host objects can be associated with the cluster queue:
> " One execution host
> " A list of separate execution hosts
> " One or more host groups
> A host group is a group of hosts that can be treated collectively as
> identical. Host groups enable management of multiple hosts by means of
> a single host group con-figuration. For more information about host
> groups, see "Configuring Host Groups With QMON" in chapter 1 of the
> Administration Guide.
> When associating individual hosts with a cluster queue, the name of
> the resulting queue instance on each host combines the cluster queue
> name with the host name. The cluster queue name and the host name are
> separated by an @ sign. For exam-ple, if associating the host
> myexechost with the cluster queue myqueue, the result-ing queue
> instance is called myqueue at myexechost. 
> When associating a host group with a cluster queue, a queue domain is
> created. Queue domains enable management of groups of queue instances
> that are part of the same cluster queue and whose assigned hosts are
> part of the same host group. A queue domain name combines a cluster
> queue name with a host group name, separated by an @ sign. For
> example, if the host group @myhostgroup (host group names must start
> with an @) is associated with the cluster queue myqueue, the resulting
> queue domain is myqueue@@myhostgroup.
> 1.6.1. Adding Queues
> Using qmon, click the "Queue Control" button and then click the "Add"
> button.  First, enter the "Queue Name" (by convention, queue names
> always end in .q as in "fast.q").  Choose the name with care, it
> cannot be changed later. 
> Next, enter a host or host group name in the "New Host/Hostgroup" box
> and click the red left arrow.  Enter as many hosts or host groups as
> needed, their names will ap-pear in the "Hostlist" box at the top left
> of the window.
> The "@/" listing in the "Attributes for Host/Hostgroup" list on the
> lower left of the win-dow denotes attributes that are the default for
> each host or hostgroup in this queue.  Hosts or host groups from the
> Hostlist box can be added to this listing and their at-tributes
> specified differently from the defaults by entering their name in the
> "New Host/Hostgroup" box and clicking the red up arrow.
> NOTE: When changing an attribute, the padlock icon associated with the
> attribute may need to be clicked to unlock the field for entry.
> 1.6.2. General Configuration
> The following attributes will need to be set for queues in the tab
> with the "General Configuration" label.
> Processors
> Set to the number of processors either as the default or for the
> specific queue.
> Slots
> This is the number of jobs that can be active on a host
> simultaneously.  This can be more than the number of processors if
> host over-scheduling is desired.  Also, it could be twice the number
> of processors if they are Intel processors with hyper-threading. 
> Experiment with what gives the best overall performance for the host.
> Notify Time
> Scripts can catch signals sent by the Grid Engine to know when a job
> is about to be killed.  Because of this, at least 1 minute of time
> should be set in this field to allow for delays under heavy processor
> loads.
> 1.7. Cluster Configuration
> 1.7.1. Global Job Submission Parameters
> The file $SGE_ROOT/<cellname>/common/sge_request contains default
> parame-ters for the qsub command.  For ease of writing shell scripts
> which will be submitted to SGE to run, the following parameters are
> suggested.  These parameters can be ignored by using the -clear
> parameter as the first parameter on the qsub command line.
> -w e Give errors and exit if a job being submitted can never be
> scheduled.
> -V Export all variables from the user's environment into the job's
> environ-ment.  By default, SGE builds a very minimal environment.
> -cwd Run the job in the directory from which it was submitted.  By
> default SGE will run the job from the user's login directory.
> 1.7.2. Job/Shell Scripts
> Shell Start Mode
> For for ease of use of custom written scripts set the global cluster
> shell start mode so that the shell for the script to use is given by
> the first line of the script, just as if it was run directly from a
> command line. To do this, configure the cluster global configuration
> (see sge_conf(5) with the following:
> shell_start_mode unix_behavior
> Using qmon, click the "Cluster Configuration" button, select "global"
> in the left col-umn and click the "Modify" button.  In the General
> Settings tab, find "Shell Start Mode" along the left side and set the
> selection to unix_behavior. 
> Active Comments
> SGE allows for what are called active comments.  These comments are a
> way to embed command line arguments to qsub in scripts.  By default,
> active comments are found on lines that begin with "#$" (the "$" can
> be changed).  The following are gen-erally useful:
> #$ -o /dev/null -jy Send all job output to the bit bucket.  By
> default, SGE will send output to the file
> <jobname>.<jobid>.<tasknumber>.
> #$ -m e Send a notification email to the user submitting the job when
> it completes.  Additional letters can be used in place of or appended
> to the "e" with the following meaning:
> `b'     Mail is sent at the beginning of the job.
> `e'     Mail is sent at the end of the job.
> `a'     Mail is sent when the job is aborted or rescheduled.
> `s'     Mail is sent when the job is suspended.
> `n'     No mail is sent.
> SGE Environmental Variables
> SGE makes available a number of environmental variables for use by job
> scripts.  In order to make scripts run with or without SGE, the
> following are several lines which have been found to be useful.  Note
> that the syntax
> ": ${<variable>=<value>}"
> tells the shell to set the given variable to the given value IF the
> variable is not al-ready set.
> # Set up restart status
> : ${RESTARTED=0}
> # Get our host name without any domain name.
> xeqHost=`echo $HOSTNAME | sed 's/\..*//'`
> # Get the name of the host that originally submitted the job
> : ${SGE_O_HOST=`uname -n`}
> submitHost=`echo $SGE_O_HOST | sed 's/\..*//'`
> # Set a default task number if not using SGE
> : ${SGE_TASK_ID=1}
> # If SGE is was not given a rep number
> if [ "$SGE_TASK_ID" = "undefined" ]
> then
> fi
> # Get our comand name if not being run by the SGE
> : ${REQUEST=$0}
> myName=$REQUEST
> cmdRoot=`basename $myName`
> myPath=`dirname $myName`
> 1.8. Job Submission
> 1.8.1. Submitting Jobs
> Chapter 3 of the User's Guide is an excellent resource for understand
> how to submit job scripts or binary executables to SGE.  One
> additional thing should be noted.  When using the qsub command, SGE
> does not search the PATH variable to find the command being
> submitting. In other words, if the following command is entered:
> qsub -q fast.q -t 1-10 myScript
> it will not work unless myScript is in the current directory.  Qsub
> will issue the error message "Unable to read script file because of
> error: error opening myScript: No such file or directory".
> To prevent this, use the which command to find the script and then
> give it to qsub as follows:
> qsub -q fast.q -t 1-10 `which myScript`
> It is also acceptable to just type in the absolute pathname to the
> script: 
> qsub -q fast.q -t 1-10 /home/john/.bin/myScript
> On Wed, 2005-04-06 at 11:39, Schmitz Dale M Contr 20 IS/INPTG wrote: 
> > The job is a script...is there something else I must do for the engine?
> > 
> > -----Original Message-----
> > From: raysonho at eseenet.com [mailto:raysonho at eseenet.com] 
> > Sent: Wednesday, April 06, 2005 11:35 AM
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] Newbie questions
> > 
> > >Initial attempts at
> > >running an application on the engine have all failed
> > 
> > How did you submit your jobs, and did you create a job script??
> > 
> > 
> > > Does my software require recompiling for the grid engine
> > > environment?  
> > 
> > No.
> > 
> > Rayson
> > 
> > ---------------------------------------------------------
> > Get your FREE E-mail account at http://www.eseenet.com !
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list