[GE users] job submission verifier
reuti at staff.uni-marburg.de
Tue Sep 23 13:20:10 BST 2008
Am 23.09.2008 um 13:45 schrieb Ernst Bablick:
> in the past some users expressed their need for some kind of
> presubmission procedure which is executed whenever a job enters the
> GE system (see also issue #2621).
> Find attached a draft for a corresponding GE enhancement. Please
> give feedback by Friday.
> Functional Specification: Job Submission Verifier
> Version Comment Date Author
> ------- ------------------------------------- --------
> 0.1 Initial version ?
> Andreas Haas
> 0.5 Describe changes so that enhancement 17-09-08
> Ernst Bablick
> can be implemented for Urubu with
> less performance loss
> 0.6 added missing parts according to 22-09-08
> Ernst Bablick
> discussion with RD and AS
> 1 INTRODUCTION
> In the past some of our users expressed their need for some
> kind of
> presubmission procedure which is executed whenever a job
> enters the GE
> system. (see also issue #2621). Here are some examples what
> should be
> done in such a procedure:
> - Check accounting DB to make sure the user has enough wall
> hours in their account to run the requested job on the
> slots for the requested time.
> - Guarantee that the number of slots requested is a multiple
> of 16 for
> parallel jobs.
> - Verify that the user can write to various shared filesystems.
> - Make sure that the user does not request certain -l
> resources that
> might not behave the way the user expects them to (h_vmem,
> - Add required resource requests that users don't now are
> - Add a project request of the form -P queue_name where
> queue_name is
> the queue used with the -q option.
> - Make sure that the user hasn't messed up their ssh keys so
> that they cannot ssh into compute nodes w/o a passphrase.
> - Print out status messages and errors about the above as
> well as
> printing out the queue, allocation account name, PE,
> total number of tasks requested, and number of tasks per node
> - Print out an motd-like message at the top of qsub output
>> qsub job.sge
> Please note that we strongly advise using the mvapich-
> devel MPI
> stack for running jobs with more than 2048 MPI tasks.
> --> Submitting 16 tasks...
> --> Submitting 16 tasks/host...
> --> Submitting exclusive job to 1 hosts...
> 2 PROJECT OVERVIEW
> 2.1 Project Aim
> Aim of the project is it to provide a interface enhancement
> for GE that
> allows it to define job verification/modification routines
> which will
> either be executed on client side or within qmaster process
> when a
> job enters the system or both.
> 2.2 Project Benefit
> The administrator of a GE cluster can define additional
> policies needed.
> The GE cluster will not be loaded with jobs which would break
> a defined
> policy if a job verification/modification routine is defined.
> 2.3 Project Duration
> 2.4 Project Dependencies
> There are no known dependencies with other projects
> 3 SYSTEM ARCHITECTURE
> 3.1 Enhancement Functions
> Here is the summary of the customer needs:
> (N1) The administrator gets the possibility to define job
> procedure which will be executed in qsub, qrsh, qsh,
> qlogin, qmon
> and applications using DRMAA, to evaluate a job before
> it is send
> to qmaster
> (N2) The administrator gets the possibility to define a job
> procedure which will be executed on qmaster side before
> a job
> is finally added to the qmaster data store or before the
> modification of a job is finally accepted.
> (N3) It will be possible to define under which user account the
> verification procedure within the master is executed.
> By default
> the script is executed as sgeadmin user. Within the
> client context
> the script is executed as submit used.
> (N4) Data defining the job will be provided to the verification
> (N5) After evaluating a job the verification result might
> either be:
> * accept job
> * correct parameters part of the job specification
> * reject job
> * temporarily reject job (it might be accepted later)
> (N6) Nearly all parameters which define a job can be changed
> by the
> verification procedure but there are some exceptions.
> things are only available as read only parameter:
> * type (qsub job => qlogin ...)
> * script file to be executed
> * arguments passed to the job
> * user who submitted the job
> The job script contend itself is not available in the job
> submission verification script.
> (N7) As a minimum requirement at least following parameters
> have to be
> changeable by the job verification procedure in a first
> * pe request
> * resource requests (hard and soft)
> * queue and host requests
> * project request
> Implementation notes and necessary steps:
> (I1) (N1) and (N2) will be realized as script. The script
> language can
> be chosen by the administrator.
> (I2) The script has to be written in a way so that it can be
> like a loadsensor script. It has to accept commands and
> parameters from stdin and return results via stdout.
> It should not terminate until it gets a corresponding
> (I3) A file named "client_jsv" and located in $SGE_ROOT/
> will be started by the clients qsub, qrsh, qsh, qlogin
> and qmon and
> DRMAA library (N1) before a new job will be sent to
> qmaster. This
> script will be started under the user account of the
> user which
> tries to start a new job
> (I4) The script to be evaluated in qmaster (N2) has to be
> in the cluster configuration. The parameter will be named
> "server_jsv" and similar to "prolog" and "epilog" it will
> allow to specify under which user privileges this
> procedure will
> be started. (N3)
> (I5) One instance of server_jsv will be started during
> startup of
> qmaster for each worker thread or whenever the cluster
> configuration parameter changes or whenever the
> timestamp of the
> script file changes.
> (I6) The server side instances of the verification scripts
> are connected
> to the worker threads via pipes. Parameters and
> commands will
> be send to the scripts and the response is read from
> the script
> (I7) After the script has been started it has to be
> responsive to
> execute following commands. Please note that each command
> might print ERROR=<message> to stdout to indicate an
> command action
> START Trashes cached data and starts a verification
> for a
> new job.
> Prints STARTED to stdout
> After that the script accepts only a BEGIN or
> one or
> multiple PARAM_<name>=<value> commands
> BEGIN This command triggers the verification of
> parameters set by PARAM_<name>=<value>
> Prints RESULT=<result> and optionally
> RESULT_MSG=<message> or RESULT_MSG_LOG=<message>
> <result> might be:
> job is accepted without changes
> job is accepted but all PARAM_<name>...
> which have
> been sent between the initial BEGIN and
> the final
> RESULT have to be evaluated and applied
> to the job
> before it is accepted.
> job is rejected
> job is rejected but might be accepted later
> <message> is a user readable message
> which will be sent to the client to be
> printed as
> GDI answer (RESULT_MSG) or it will be
> printed to
> stdout of the client command
> (RESULT_MSG_LOG on
> client side) or it will be printed to the
> messages file (RESULT_MSG_LOG in master side)
> PARAM_<name>=<value> <name> and <value> are
> parameter names
> and corresponding values as documented in
> submit(1) e.g.
> <name> <value>
> ----------- ---------------------
> a <date_time>
> ac <variable>[=<value>],...
> b "y" | "n"
> additionally following names are supported
> CLIENT "qsub" | "qsh" | "qlogin" | "qmon"
> | "qalter"
> CONTEXT "client" | "server"
> explains if the script is executed
> in a client
> (N1) or in the master (N2)
> JOB_ID <job_id>
> (only available on server side)
> SCRIPT <path_of_job_script>
> SCRIPT_ARGS <arguments_for_job_script>
> USER <submit_user_name>
> QUIT Terminates the job submission verification script
> Exampe: Find below the data which is sent to the job
> verification script, when following job is
>> qsub -pe pe1 3 -hard -l lic=1 -soft -l q=all.q troete.sh
> Please note that parameters that are not
> requested by the submitter of a job are not passed
> to the script. This means that e.g "-b n" of
> qsub won't be
> passed to the script because this is the default
> when nothing else is specified.
> Input Output
> 01) "START\n"
> 02) "STARTED\n"
> 03) "PARAM_CLIENT=qsub"
> 04) "PARAM_USER=ernst"
> 05) "PARAM_pe=pe1 3\n"
> 06) "PARAM_hard=\n"
> 07) "PARAM_l=lic=1\n"
> 08) "PARAM_soft=\n"
> 09) "PARAM_l=q=all.q\n"
> 10) "PARAM_SCRIPT=troete.sh\n"
> 11) "BEGIN\n"
> 12) "PARAM_pe=pe1 4\n"
> 13) "RESULT_MSG=no multiple of 4\n"
> 14) "RESULT=CORRECT\n"
> 13) "START\n"
> 14) "STARTED\n"
> 15) ...
> 99) "QUIT\n"
looks feasible. Questions:
- are all options from "sge_request" already included here?
- will -soft and -hard be grouped (maybe they should be mentioned per
parameter for easier parsing)?
- how are many resource request coded? I mean "-l type1=5,type2=8"
will it be "PARAM_type1=5\n" plus "PARAM_type2=8\n" or just in one
Somehow this means to implement a parser in the script to look for
"=" and strip of the "PARAM_". Maybe it would be easier to send these
items by sending a line with:
"PARAM" "CLIENT" "qsub"\n
Then the script could simply use (note the use of ' and " for
$ line='"PARAM" "CLIENT" "qsub"'
$ eval set $line
$ echo $1
$ echo $2
$ echo $3
even this works:
$ line='"PARAM" "l" "type" "with some blanks"'
$ eval set $line
$ echo $4
with some blanks
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users