[GE users] job submission verifier

Ernst Bablick Ernst.Bablick at Sun.COM
Tue Sep 23 12:45:20 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

in the past some users expressed their need for some kind of 
presubmission procedure which is executed whenever a job enters the GE 
system (see also issue #2621).

Find attached a draft for a corresponding GE enhancement. Please give 
feedback by Friday.

Regards,

Ernst


    [ Part 2: "Attached Text" ]

???      Functional Specification: Job Submission Verifier 
      =================================================

      Version  Comment                                Date      Author
      -------  -------------------------------------  --------  -------------
      0.1      Initial version                        ?         Andreas Haas
      0.5      Describe changes so that enhancement   17-09-08  Ernst Bablick
               can be implemented for Urubu with
               less performance loss
      0.6      added missing parts according to       22-09-08  Ernst Bablick
               discussion with RD and AS

1     INTRODUCTION
      ============

      In the past some of our users expressed their need for some kind of 
      presubmission procedure which is executed whenever a job enters the GE
      system. (see also issue #2621). Here are some examples what should be 
      done in such a procedure:
       
      -  Check accounting DB to make sure the user has enough wall clock 
         hours in their account to run the requested job on the requested 
         slots for the requested time. 

      -  Guarantee that the number of slots requested is a multiple of 16 for
         parallel jobs.
   
      -  Verify that the user can write to various shared filesystems.

      -  Make sure that the user does not request certain -l resources that 
         might not behave the way the user expects them to (h_vmem, h_data, 
         etc). 

      -  Add required resource requests that users don't now are mandatory. 

      -  Add a project request of the form -P queue_name where queue_name is 
         the queue used with the -q option.

      -  Make sure that the user hasn't messed up their ssh keys so badly 
         that they cannot ssh into compute nodes w/o a passphrase.

      -  Print out status messages and errors about the above as well as 
         printing out the queue, allocation account name, PE, 
         total number of tasks requested, and number of tasks per node 
         requested.

      -  Print out an motd-like message at the top of qsub output

         > qsub job.sge
         Welcome
         -------
         Please note that we strongly advise using the mvapich-devel MPI
         stack for running jobs with more than 2048 MPI tasks.
         ---------------------------------------------------------------
         --> Submitting 16 tasks...
         --> Submitting 16 tasks/host...
         --> Submitting exclusive job to 1 hosts...
         ...
   

2     PROJECT OVERVIEW
      ================

2.1   Project Aim

      Aim of the project is it to provide a interface enhancement for GE that
      allows it to define job verification/modification routines which will 
      either be executed on client side or within qmaster process when a 
      job enters the system or both.

2.2   Project Benefit

      The administrator of a GE cluster can define additional policies needed.

      The GE cluster will not be loaded with jobs which would break a defined
      policy if a job verification/modification routine is defined. 

2.3   Project Duration

2.4   Project Dependencies

      There are no known dependencies with other projects


3     SYSTEM ARCHITECTURE
      ===================

3.1   Enhancement Functions

      Here is the summary of the customer needs:
   
      (N1)  The administrator gets the possibility to define job verification
            procedure which will be executed in qsub, qrsh, qsh, qlogin, qmon 
            and applications using DRMAA, to evaluate a job before it is send 
            to qmaster

      (N2)  The administrator gets the possibility to define a job verification
            procedure which will be executed on qmaster side before a job 
            is finally added to the qmaster data store or before the 
            modification of a job is finally accepted.

      (N3)  It will be possible to define under which user account the
            verification procedure within the master is executed. By default 
            the script is executed as sgeadmin user. Within the client context
            the script is executed as submit used.

      (N4)  Data defining the job will be provided to the verification 
            procedure. 

      (N5)  After evaluating a job the verification result might either be:
               *  accept job
               *  correct parameters part of the job specification
               *  reject job 
               *  temporarily reject job (it might be accepted later)

      (N6)  Nearly all parameters which define a job can be changed by the 
            verification procedure but there are some exceptions. Following
            things are only available as read only parameter:
               * type (qsub job => qlogin ...)
               * script file to be executed
               * arguments passed to the job  
               * user who submitted the job
            The job script contend itself is not available in the job
            submission verification script.
      
   
      (N7)  As a minimum requirement at least following parameters have to be
            changeable by the job verification procedure in a first 
            implementation
               * pe request 
               * resource requests (hard and soft)
               * queue and host requests 
               * project request

      Implementation notes and necessary steps:

      (I1)  (N1) and (N2) will be realized as script. The script language can 
            be chosen by the administrator. 

      (I2)  The script has to be written in a way so that it can be executed
            like a loadsensor script. It has to accept commands and 
            parameters from stdin and return results via stdout. 
            It should not terminate until it gets a corresponding command.

      (I3)  A file named "client_jsv" and located in $SGE_ROOT/$SGE_CELL/common
            will be started by the clients qsub, qrsh, qsh, qlogin and qmon and
            DRMAA library (N1) before a new job will be sent to qmaster. This 
            script will be started under the user account of the user which 
            tries to start a new job

      (I4)  The script to be evaluated in qmaster (N2) has to be configured
            in the cluster configuration. The parameter will be named
            "server_jsv" and similar to "prolog" and "epilog" it will
            allow to specify under which user privileges this procedure will 
            be started. (N3)

      (I5)  One instance of server_jsv will be started during startup of 
            qmaster for each worker thread or whenever the cluster
            configuration parameter changes or whenever the timestamp of the
            script file changes. 
      
      (I6)  The server side instances of the verification scripts are connected
            to the worker threads via pipes. Parameters and commands will
            be send to the scripts and the response is read from the script 
            output.

      (I7)  After the script has been started it has to be responsive to 
            execute following commands. Please note that each command 
            might print ERROR=<message> to stdout to indicate an error.

            command  action
            -------  --------------------------------------------------------- 
            START    Trashes cached data and starts a verification for a 
                     new job. 
                                 
                     Prints STARTED to stdout

                     After that the script accepts only a BEGIN or one or 
                     multiple PARAM_<name>=<value> commands 

            BEGIN    This command triggers the verification of provided 
                     parameters set by PARAM_<name>=<value>

                     Prints RESULT=<result> and optionally 
                     RESULT_MSG=<message> or RESULT_MSG_LOG=<message> 

                     <result> might be:
                        ACCEPT         
                           job is accepted without changes
                        CORRECT        
                           job is accepted but all PARAM_<name>... which have 
                           been sent between the initial BEGIN and the final 
                           RESULT have to be evaluated and applied to the job
                           before it is accepted.
                        REJECT         
                           job is rejected
                        REJECT_WAIT    
                           job is rejected but might be accepted later 

                     <message> is a user readable message
                        which will be sent to the client to be printed as
                        GDI answer (RESULT_MSG) or it will be printed to
                        stdout of the client command (RESULT_MSG_LOG on
                        client side) or it will be printed to the master 
                        messages file (RESULT_MSG_LOG in master side)

            PARAM_<name>=<value>    <name> and <value> are parameter names 
                     and corresponding values as documented in submit(1) e.g.

                     <name>      <value>
                     ----------- ---------------------
                     a           <date_time>
                     ac          <variable>[=<value>],...
                     b           "y" | "n"
                     ...

                     additionally following names are supported

                     CLIENT      "qsub" | "qsh" | "qlogin" | "qmon" | "qalter"
                     CONTEXT     "client" | "server"
                                 explains if the script is executed in a client 
                                 (N1) or in the master (N2)
                     JOB_ID      <job_id>
                                 (only available on server side)
                     SCRIPT      <path_of_job_script>
                     SCRIPT_ARGS <arguments_for_job_script>
                     USER        <submit_user_name>

            QUIT     Terminates the job submission verification script

            Exampe: Find below the data which is sent to the job submission
                    verification script, when following job is submitted:

                    > qsub -pe pe1 3 -hard -l lic=1 -soft -l q=all.q troete.sh 

                    Please note that parameters that are not explicitely 
                    requested by the submitter of a job are not passed
                    to the script. This means that e.g "-b n" of qsub won't be
                    passed to the script because this is the default
                    when nothing else is specified.

                Input                Output

            01) "START\n" 
            02)                      "STARTED\n"
            03) "PARAM_CLIENT=qsub"
            04) "PARAM_USER=ernst"
            05) "PARAM_pe=pe1 3\n"
            06) "PARAM_hard=\n"
            07) "PARAM_l=lic=1\n"
            08) "PARAM_soft=\n"
            09) "PARAM_l=q=all.q\n" 
            10) "PARAM_SCRIPT=troete.sh\n"
            11) "BEGIN\n"
            12)                      "PARAM_pe=pe1 4\n"
            13)                      "RESULT_MSG=no multiple of 4\n"
            14)                      "RESULT=CORRECT\n"

            13) "START\n"
            14)                      "STARTED\n"
            15) ...

            99) "QUIT\n"




    [ Part 3: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list