[GE users] Job Re-schedule based on incoming users to pending queue

Daniel Templeton Dan.Templeton at Sun.COM
Fri Jul 6 16:57:04 BST 2007


    [ The following text is in the "windows-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Another trick for bouncing jobs might be to create a checkpointing 
environment that doesn't do anything. Any job that uses a checkpointing 
environment will be rescheduled instead of suspended, but because this 
checkpointing environment doesn't actually checkpoint anything, the 
effect is just to bounce the job. I haven't tested this idea yet, but it 
sounds plausible. :)

Daniel

Olesen, Mark wrote:
>>> Is there a way, using the policies or however, to suspend/kill a
>>> /running /job, re-schedule it back into the /pending/ queue based on
>>> incoming users to the pending queue.
>>>
>>> I.E. If one user is using/running all licenses of a given software
>>> (suppose 10 jobs) but a new user submits his jobs to the pending
>>> queue, I want to kill/suspend 5 of the first users running jobs and
>>> put them back into the pending queue so that the new incoming users
>>> can get 5 jobs running right away. The users will all be equally
>>> entitled (have the same num of functional tickets for example).
>>>       
>
> You might attempt the following (untested).
> Submit all jobs with the '-notify' parameter (see qsub manpage).
> You can then set up a trap function within the job script:
>
> #
> # signal GridEngine to requeue job
> #
> job_suspend() {
>    timeInfo=`date +'%Y-%m-%dT%H:%M:%S'`
>    echo "(**) caught suspend request at $timeInfo"
>    exit 99
> }
>
> # with '-notify' we receive
> #   STOP => USR1 (suspend)
> #   KILL => USR2 (kill)
> trap 'trap_suspend' USR1
>
>
> You can misuse a load sensor script (qmaster only!) to regularly determine
> which jobs, if any, need to be suspended.
>
> # qloadsensor
> while :
> do
>    read input || exit 1		# wait for input
>    [ "$input" = quit ] && exit 0
>
>    echo begin			# begin load report
>    host_info			# host information
>    iidle_info			# machine's idle time
>    echo end			      # end load report
>
>    # let scripts run between load reports
>    SGE_qmaster=`act_qmaster`	# refresh the name of the qmaster
>    if [ "$HOST" = "$SGE_qmaster" ]; then
>       # CALL_YOUR_SCRIPT_HERE
>       # express - force rescheduling of jobs
>       $SGE_site/qxprs >/dev/null 2>&1
>    fi
> done
> exit 0			# we never get here, but just in case
>
>
> The 'qxprs' script is where you have to implement your program logic.
>   * licenses in use
>   * licenses in queued
>   * qmod -sj/-sq to suspend jobs/queue_instances 
>
> The above will likely not work with OpenMPI.
>
>
> Let us know how you get on.
>
> /mark
> This e-mail message and any attachments may contain legally privileged, confidential or proprietary Information, or information otherwise protected by law of EMCON Technologies, its affiliates, or third parties. This notice serves as marking of its ?Confidential? status as defined in any confidentiality agreements concerning the sender and recipient. If you are not the intended recipient(s), or the employee or agent responsible for delivery of this message to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this e-mail message is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete this e-mail message from your computer.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list