[GE users] job submission verifier
Ernst.Bablick at Sun.COM
Tue Sep 23 14:12:29 BST 2008
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
find comments inlined...
> Am 23.09.2008 um 13:45 schrieb Ernst Bablick:
>> in the past some users expressed their need for some kind of
>> presubmission procedure which is executed whenever a job enters the
>> GE system (see also issue #2621).
>> Find attached a draft for a corresponding GE enhancement. Please give
>> feedback by Friday.
>> Functional Specification: Job Submission Verifier
>> Version Comment Date Author
>> ------- ------------------------------------- --------
>> 0.1 Initial version ?
>> Andreas Haas
>> 0.5 Describe changes so that enhancement 17-09-08 Ernst
>> can be implemented for Urubu with
>> less performance loss
>> 0.6 added missing parts according to 22-09-08 Ernst
>> discussion with RD and AS
>> 1 INTRODUCTION
>> In the past some of our users expressed their need for some
>> kind of
>> presubmission procedure which is executed whenever a job enters
>> the GE
>> system. (see also issue #2621). Here are some examples what
>> should be
>> done in such a procedure:
>> - Check accounting DB to make sure the user has enough wall clock
>> hours in their account to run the requested job on the
>> slots for the requested time.
>> - Guarantee that the number of slots requested is a multiple
>> of 16 for
>> parallel jobs.
>> - Verify that the user can write to various shared filesystems.
>> - Make sure that the user does not request certain -l
>> resources that
>> might not behave the way the user expects them to (h_vmem,
>> - Add required resource requests that users don't now are
>> - Add a project request of the form -P queue_name where
>> queue_name is
>> the queue used with the -q option.
>> - Make sure that the user hasn't messed up their ssh keys so
>> that they cannot ssh into compute nodes w/o a passphrase.
>> - Print out status messages and errors about the above as well as
>> printing out the queue, allocation account name, PE,
>> total number of tasks requested, and number of tasks per node
>> - Print out an motd-like message at the top of qsub output
>>> qsub job.sge
>> Please note that we strongly advise using the mvapich-devel MPI
>> stack for running jobs with more than 2048 MPI tasks.
>> --> Submitting 16 tasks...
>> --> Submitting 16 tasks/host...
>> --> Submitting exclusive job to 1 hosts...
>> 2 PROJECT OVERVIEW
>> 2.1 Project Aim
>> Aim of the project is it to provide a interface enhancement for
>> GE that
>> allows it to define job verification/modification routines
>> which will
>> either be executed on client side or within qmaster process when a
>> job enters the system or both.
>> 2.2 Project Benefit
>> The administrator of a GE cluster can define additional
>> policies needed.
>> The GE cluster will not be loaded with jobs which would break a
>> policy if a job verification/modification routine is defined.
>> 2.3 Project Duration
>> 2.4 Project Dependencies
>> There are no known dependencies with other projects
>> 3 SYSTEM ARCHITECTURE
>> 3.1 Enhancement Functions
>> Here is the summary of the customer needs:
>> (N1) The administrator gets the possibility to define job
>> procedure which will be executed in qsub, qrsh, qsh,
>> qlogin, qmon
>> and applications using DRMAA, to evaluate a job before it
>> is send
>> to qmaster
>> (N2) The administrator gets the possibility to define a job
>> procedure which will be executed on qmaster side before a
>> is finally added to the qmaster data store or before the
>> modification of a job is finally accepted.
>> (N3) It will be possible to define under which user account the
>> verification procedure within the master is executed. By
>> the script is executed as sgeadmin user. Within the
>> client context
>> the script is executed as submit used.
>> (N4) Data defining the job will be provided to the verification
>> (N5) After evaluating a job the verification result might
>> either be:
>> * accept job
>> * correct parameters part of the job specification
>> * reject job
>> * temporarily reject job (it might be accepted later)
>> (N6) Nearly all parameters which define a job can be changed
>> by the
>> verification procedure but there are some exceptions.
>> things are only available as read only parameter:
>> * type (qsub job => qlogin ...)
>> * script file to be executed
>> * arguments passed to the job
>> * user who submitted the job
>> The job script contend itself is not available in the job
>> submission verification script.
>> (N7) As a minimum requirement at least following parameters
>> have to be
>> changeable by the job verification procedure in a first
>> * pe request
>> * resource requests (hard and soft)
>> * queue and host requests
>> * project request
>> Implementation notes and necessary steps:
>> (I1) (N1) and (N2) will be realized as script. The script
>> language can
>> be chosen by the administrator.
>> (I2) The script has to be written in a way so that it can be
>> like a loadsensor script. It has to accept commands and
>> parameters from stdin and return results via stdout.
>> It should not terminate until it gets a corresponding
>> (I3) A file named "client_jsv" and located in
>> will be started by the clients qsub, qrsh, qsh, qlogin
>> and qmon and
>> DRMAA library (N1) before a new job will be sent to
>> qmaster. This
>> script will be started under the user account of the user
>> tries to start a new job
>> (I4) The script to be evaluated in qmaster (N2) has to be
>> in the cluster configuration. The parameter will be named
>> "server_jsv" and similar to "prolog" and "epilog" it will
>> allow to specify under which user privileges this
>> procedure will
>> be started. (N3)
>> (I5) One instance of server_jsv will be started during startup of
>> qmaster for each worker thread or whenever the cluster
>> configuration parameter changes or whenever the timestamp
>> of the
>> script file changes.
>> (I6) The server side instances of the verification scripts are
>> to the worker threads via pipes. Parameters and commands
>> be send to the scripts and the response is read from the
>> (I7) After the script has been started it has to be responsive to
>> execute following commands. Please note that each command
>> might print ERROR=<message> to stdout to indicate an error.
>> command action
>> START Trashes cached data and starts a verification for a
>> new job.
>> Prints STARTED to stdout
>> After that the script accepts only a BEGIN or
>> one or
>> multiple PARAM_<name>=<value> commands
>> BEGIN This command triggers the verification of provided
>> parameters set by PARAM_<name>=<value>
>> Prints RESULT=<result> and optionally
>> RESULT_MSG=<message> or RESULT_MSG_LOG=<message>
>> <result> might be:
>> job is accepted without changes
>> job is accepted but all PARAM_<name>...
>> which have
>> been sent between the initial BEGIN and
>> the final
>> RESULT have to be evaluated and applied to
>> the job
>> before it is accepted.
>> job is rejected
>> job is rejected but might be accepted later
>> <message> is a user readable message
>> which will be sent to the client to be
>> printed as
>> GDI answer (RESULT_MSG) or it will be printed to
>> stdout of the client command (RESULT_MSG_LOG on
>> client side) or it will be printed to the master
>> messages file (RESULT_MSG_LOG in master side)
>> PARAM_<name>=<value> <name> and <value> are parameter
>> and corresponding values as documented in
>> submit(1) e.g.
>> <name> <value>
>> ----------- ---------------------
>> a <date_time>
>> ac <variable>[=<value>],...
>> b "y" | "n"
>> additionally following names are supported
>> CLIENT "qsub" | "qsh" | "qlogin" | "qmon" |
>> CONTEXT "client" | "server"
>> explains if the script is executed
>> in a client
>> (N1) or in the master (N2)
>> JOB_ID <job_id>
>> (only available on server side)
>> SCRIPT <path_of_job_script>
>> SCRIPT_ARGS <arguments_for_job_script>
>> USER <submit_user_name>
>> QUIT Terminates the job submission verification script
>> Exampe: Find below the data which is sent to the job
>> verification script, when following job is
>>> qsub -pe pe1 3 -hard -l lic=1 -soft -l q=all.q troete.sh
>> Please note that parameters that are not explicitely
>> requested by the submitter of a job are not passed
>> to the script. This means that e.g "-b n" of qsub
>> won't be
>> passed to the script because this is the default
>> when nothing else is specified.
>> Input Output
>> 01) "START\n"
>> 02) "STARTED\n"
>> 03) "PARAM_CLIENT=qsub"
>> 04) "PARAM_USER=ernst"
>> 05) "PARAM_pe=pe1 3\n"
>> 06) "PARAM_hard=\n"
>> 07) "PARAM_l=lic=1\n"
>> 08) "PARAM_soft=\n"
>> 09) "PARAM_l=q=all.q\n"
>> 10) "PARAM_SCRIPT=troete.sh\n"
>> 11) "BEGIN\n"
>> 12) "PARAM_pe=pe1 4\n"
>> 13) "RESULT_MSG=no multiple of 4\n"
>> 14) "RESULT=CORRECT\n"
>> 13) "START\n"
>> 14) "STARTED\n"
>> 15) ...
>> 99) "QUIT\n"
> looks feasible. Questions:
> - are all options from "sge_request" already included here?
> - will -soft and -hard be grouped (maybe they should be mentioned per
> parameter for easier parsing)?
> - how are many resource request coded? I mean "-l type1=5,type2=8"
> will it be "PARAM_type1=5\n" plus "PARAM_type2=8\n" or just in one
I would send one statement. Otherwise we would need to enhance the
protocol by commands which address elements in lists like in -l or -v so
that new elements can be added or removed by JSV scripts.
> Somehow this means to implement a parser in the script to look for "="
> and strip of the "PARAM_". Maybe it would be easier to send these
> items by sending a line with:
> "PARAM" "CLIENT" "qsub"\n
> Then the script could simply use (note the use of ' and " for
> demonstration purpose):
> $ line='"PARAM" "CLIENT" "qsub"'
> $ eval set $line
> $ echo $1
> $ echo $2
> $ echo $3
> even this works:
> $ line='"PARAM" "l" "type" "with some blanks"'
> $ eval set $line
> $ echo $4
> with some blanks
You are right. At least here we can save some parsing effort. I will
> -- Reuti
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users