[GE users] job submission verifier

Ernst Bablick Ernst.Bablick at Sun.COM
Tue Sep 23 14:12:29 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti,

find comments inlined...

Reuti wrote:
> Hi,
>
> Am 23.09.2008 um 13:45 schrieb Ernst Bablick:
>
>> in the past some users expressed their need for some kind of 
>> presubmission procedure which is executed whenever a job enters the 
>> GE system (see also issue #2621).
>>
>> Find attached a draft for a corresponding GE enhancement. Please give 
>> feedback by Friday.
>>
>> Regards,
>>
>> Ernst
>>       Functional Specification: Job Submission Verifier
>>       =================================================
>>
>>       Version  Comment                                Date      Author
>>       -------  -------------------------------------  --------  
>> -------------
>>       0.1      Initial version                        ?         
>> Andreas Haas
>>       0.5      Describe changes so that enhancement   17-09-08  Ernst 
>> Bablick
>>                can be implemented for Urubu with
>>                less performance loss
>>       0.6      added missing parts according to       22-09-08  Ernst 
>> Bablick
>>                discussion with RD and AS
>>
>> 1     INTRODUCTION
>>       ============
>>
>>       In the past some of our users expressed their need for some 
>> kind of
>>       presubmission procedure which is executed whenever a job enters 
>> the GE
>>       system. (see also issue #2621). Here are some examples what 
>> should be
>>       done in such a procedure:
>>
>>       -  Check accounting DB to make sure the user has enough wall clock
>>          hours in their account to run the requested job on the 
>> requested
>>          slots for the requested time.
>>
>>       -  Guarantee that the number of slots requested is a multiple 
>> of 16 for
>>          parallel jobs.
>>
>>       -  Verify that the user can write to various shared filesystems.
>>
>>       -  Make sure that the user does not request certain -l 
>> resources that
>>          might not behave the way the user expects them to (h_vmem, 
>> h_data,
>>          etc).
>>
>>       -  Add required resource requests that users don't now are 
>> mandatory.
>>
>>       -  Add a project request of the form -P queue_name where 
>> queue_name is
>>          the queue used with the -q option.
>>
>>       -  Make sure that the user hasn't messed up their ssh keys so 
>> badly
>>          that they cannot ssh into compute nodes w/o a passphrase.
>>
>>       -  Print out status messages and errors about the above as well as
>>          printing out the queue, allocation account name, PE,
>>          total number of tasks requested, and number of tasks per node
>>          requested.
>>
>>       -  Print out an motd-like message at the top of qsub output
>>
>>> qsub job.sge
>>          Welcome
>>          -------
>>          Please note that we strongly advise using the mvapich-devel MPI
>>          stack for running jobs with more than 2048 MPI tasks.
>>          ---------------------------------------------------------------
>>          --> Submitting 16 tasks...
>>          --> Submitting 16 tasks/host...
>>          --> Submitting exclusive job to 1 hosts...
>>          ...
>>
>>
>> 2     PROJECT OVERVIEW
>>       ================
>>
>> 2.1   Project Aim
>>
>>       Aim of the project is it to provide a interface enhancement for 
>> GE that
>>       allows it to define job verification/modification routines 
>> which will
>>       either be executed on client side or within qmaster process when a
>>       job enters the system or both.
>>
>> 2.2   Project Benefit
>>
>>       The administrator of a GE cluster can define additional 
>> policies needed.
>>
>>       The GE cluster will not be loaded with jobs which would break a 
>> defined
>>       policy if a job verification/modification routine is defined.
>>
>> 2.3   Project Duration
>>
>> 2.4   Project Dependencies
>>
>>       There are no known dependencies with other projects
>>
>>
>> 3     SYSTEM ARCHITECTURE
>>       ===================
>>
>> 3.1   Enhancement Functions
>>
>>       Here is the summary of the customer needs:
>>
>>       (N1)  The administrator gets the possibility to define job 
>> verification
>>             procedure which will be executed in qsub, qrsh, qsh, 
>> qlogin, qmon
>>             and applications using DRMAA, to evaluate a job before it 
>> is send
>>             to qmaster
>>
>>       (N2)  The administrator gets the possibility to define a job 
>> verification
>>             procedure which will be executed on qmaster side before a 
>> job
>>             is finally added to the qmaster data store or before the
>>             modification of a job is finally accepted.
>>
>>       (N3)  It will be possible to define under which user account the
>>             verification procedure within the master is executed. By 
>> default
>>             the script is executed as sgeadmin user. Within the 
>> client context
>>             the script is executed as submit used.
>>
>>       (N4)  Data defining the job will be provided to the verification
>>             procedure.
>>
>>       (N5)  After evaluating a job the verification result might 
>> either be:
>>                *  accept job
>>                *  correct parameters part of the job specification
>>                *  reject job
>>                *  temporarily reject job (it might be accepted later)
>>
>>       (N6)  Nearly all parameters which define a job can be changed 
>> by the
>>             verification procedure but there are some exceptions. 
>> Following
>>             things are only available as read only parameter:
>>                * type (qsub job => qlogin ...)
>>                * script file to be executed
>>                * arguments passed to the job
>>                * user who submitted the job
>>             The job script contend itself is not available in the job
>>             submission verification script.
>>
>>
>>       (N7)  As a minimum requirement at least following parameters 
>> have to be
>>             changeable by the job verification procedure in a first
>>             implementation
>>                * pe request
>>                * resource requests (hard and soft)
>>                * queue and host requests
>>                * project request
>>
>>       Implementation notes and necessary steps:
>>
>>       (I1)  (N1) and (N2) will be realized as script. The script 
>> language can
>>             be chosen by the administrator.
>>
>>       (I2)  The script has to be written in a way so that it can be 
>> executed
>>             like a loadsensor script. It has to accept commands and
>>             parameters from stdin and return results via stdout.
>>             It should not terminate until it gets a corresponding 
>> command.
>>
>>       (I3)  A file named "client_jsv" and located in 
>> $SGE_ROOT/$SGE_CELL/common
>>             will be started by the clients qsub, qrsh, qsh, qlogin 
>> and qmon and
>>             DRMAA library (N1) before a new job will be sent to 
>> qmaster. This
>>             script will be started under the user account of the user 
>> which
>>             tries to start a new job
>>
>>       (I4)  The script to be evaluated in qmaster (N2) has to be 
>> configured
>>             in the cluster configuration. The parameter will be named
>>             "server_jsv" and similar to "prolog" and "epilog" it will
>>             allow to specify under which user privileges this 
>> procedure will
>>             be started. (N3)
>>
>>       (I5)  One instance of server_jsv will be started during startup of
>>             qmaster for each worker thread or whenever the cluster
>>             configuration parameter changes or whenever the timestamp 
>> of the
>>             script file changes.
>>
>>       (I6)  The server side instances of the verification scripts are 
>> connected
>>             to the worker threads via pipes. Parameters and commands 
>> will
>>             be send to the scripts and the response is read from the 
>> script
>>             output.
>>
>>       (I7)  After the script has been started it has to be responsive to
>>             execute following commands. Please note that each command
>>             might print ERROR=<message> to stdout to indicate an error.
>>
>>             command  action
>>             -------  
>> ---------------------------------------------------------
>>             START    Trashes cached data and starts a verification for a
>>                      new job.
>>
>>                      Prints STARTED to stdout
>>
>>                      After that the script accepts only a BEGIN or 
>> one or
>>                      multiple PARAM_<name>=<value> commands
>>
>>             BEGIN    This command triggers the verification of provided
>>                      parameters set by PARAM_<name>=<value>
>>
>>                      Prints RESULT=<result> and optionally
>>                      RESULT_MSG=<message> or RESULT_MSG_LOG=<message>
>>
>>                      <result> might be:
>>                         ACCEPT
>>                            job is accepted without changes
>>                         CORRECT
>>                            job is accepted but all PARAM_<name>... 
>> which have
>>                            been sent between the initial BEGIN and 
>> the final
>>                            RESULT have to be evaluated and applied to 
>> the job
>>                            before it is accepted.
>>                         REJECT
>>                            job is rejected
>>                         REJECT_WAIT
>>                            job is rejected but might be accepted later
>>
>>                      <message> is a user readable message
>>                         which will be sent to the client to be 
>> printed as
>>                         GDI answer (RESULT_MSG) or it will be printed to
>>                         stdout of the client command (RESULT_MSG_LOG on
>>                         client side) or it will be printed to the master
>>                         messages file (RESULT_MSG_LOG in master side)
>>
>>             PARAM_<name>=<value>    <name> and <value> are parameter 
>> names
>>                      and corresponding values as documented in 
>> submit(1) e.g.
>>
>>                      <name>      <value>
>>                      ----------- ---------------------
>>                      a           <date_time>
>>                      ac          <variable>[=<value>],...
>>                      b           "y" | "n"
>>                      ...
>>
>>                      additionally following names are supported
>>
>>                      CLIENT      "qsub" | "qsh" | "qlogin" | "qmon" | 
>> "qalter"
>>                      CONTEXT     "client" | "server"
>>                                  explains if the script is executed 
>> in a client
>>                                  (N1) or in the master (N2)
>>                      JOB_ID      <job_id>
>>                                  (only available on server side)
>>                      SCRIPT      <path_of_job_script>
>>                      SCRIPT_ARGS <arguments_for_job_script>
>>                      USER        <submit_user_name>
>>
>>             QUIT     Terminates the job submission verification script
>>
>>             Exampe: Find below the data which is sent to the job 
>> submission
>>                     verification script, when following job is 
>> submitted:
>>
>>> qsub -pe pe1 3 -hard -l lic=1 -soft -l q=all.q troete.sh
>>
>>                     Please note that parameters that are not explicitely
>>                     requested by the submitter of a job are not passed
>>                     to the script. This means that e.g "-b n" of qsub 
>> won't be
>>                     passed to the script because this is the default
>>                     when nothing else is specified.
>>
>>                 Input                Output
>>
>>             01) "START\n"
>>             02)                      "STARTED\n"
>>             03) "PARAM_CLIENT=qsub"
>>             04) "PARAM_USER=ernst"
>>             05) "PARAM_pe=pe1 3\n"
>>             06) "PARAM_hard=\n"
>>             07) "PARAM_l=lic=1\n"
>>             08) "PARAM_soft=\n"
>>             09) "PARAM_l=q=all.q\n"
>>             10) "PARAM_SCRIPT=troete.sh\n"
>>             11) "BEGIN\n"
>>             12)                      "PARAM_pe=pe1 4\n"
>>             13)                      "RESULT_MSG=no multiple of 4\n"
>>             14)                      "RESULT=CORRECT\n"
>>
>>             13) "START\n"
>>             14)                      "STARTED\n"
>>             15) ...
>>
>>             99) "QUIT\n"
>
> looks feasible. Questions:
>
> - are all options from "sge_request" already included here?
Yes
>
> - will -soft and -hard be grouped (maybe they should be mentioned per 
> parameter for easier parsing)?
> - how are many resource request coded? I mean "-l type1=5,type2=8"
>
> will it be "PARAM_type1=5\n" plus "PARAM_type2=8\n" or just in one 
> statement?
I would send one statement. Otherwise we would need to enhance the 
protocol by commands which address elements in lists like in -l or -v so 
that new elements can be added or removed by JSV scripts.
>
> Somehow this means to implement a parser in the script to look for "=" 
> and strip of the "PARAM_". Maybe it would be easier to send these 
> items by sending a line with:
>
> "PARAM" "CLIENT" "qsub"\n
>
> Then the script could simply use (note the use of ' and " for 
> demonstration purpose):
>
> $ line='"PARAM" "CLIENT" "qsub"'
> $ eval set $line
> $ echo $1
> PARAM
> $ echo $2
> CLIENT
> $ echo $3
> qsub
>
> even this works:
>
> $ line='"PARAM" "l" "type" "with some blanks"'
> $ eval set $line
> $ echo $4
> with some blanks
You are right. At least here we can save some parsing effort. I will 
change that...

Ernst
>
>
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list