[GE users] Job Re-schedule based on incoming users to pending queue

Olesen, Mark Mark.Olesen at emcontechnologies.com
Fri Jul 6 16:47:59 BST 2007

    [ The following text is in the "X-UNKNOWN" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

> > Is there a way, using the policies or however, to suspend/kill a
> > /running /job, re-schedule it back into the /pending/ queue based on
> > incoming users to the pending queue.
> >
> > I.E. If one user is using/running all licenses of a given software
> > (suppose 10 jobs) but a new user submits his jobs to the pending
> > queue, I want to kill/suspend 5 of the first users running jobs and
> > put them back into the pending queue so that the new incoming users
> > can get 5 jobs running right away. The users will all be equally
> > entitled (have the same num of functional tickets for example).

You might attempt the following (untested).
Submit all jobs with the '-notify' parameter (see qsub manpage).
You can then set up a trap function within the job script:

# signal GridEngine to requeue job
job_suspend() {
   timeInfo=`date +'%Y-%m-%dT%H:%M:%S'`
   echo "(**) caught suspend request at $timeInfo"
   exit 99

# with '-notify' we receive
#   STOP => USR1 (suspend)
#   KILL => USR2 (kill)
trap 'trap_suspend' USR1

You can misuse a load sensor script (qmaster only!) to regularly determine
which jobs, if any, need to be suspended.

# qloadsensor
while :
   read input || exit 1		# wait for input
   [ "$input" = quit ] && exit 0

   echo begin			# begin load report
   host_info			# host information
   iidle_info			# machine's idle time
   echo end			      # end load report

   # let scripts run between load reports
   SGE_qmaster=`act_qmaster`	# refresh the name of the qmaster
   if [ "$HOST" = "$SGE_qmaster" ]; then
      # express - force rescheduling of jobs
      $SGE_site/qxprs >/dev/null 2>&1
exit 0			# we never get here, but just in case

The 'qxprs' script is where you have to implement your program logic.
  * licenses in use
  * licenses in queued
  * qmod -sj/-sq to suspend jobs/queue_instances 

The above will likely not work with OpenMPI.

Let us know how you get on.

This e-mail message and any attachments may contain legally privileged, confidential or proprietary Information, or information otherwise protected by law of EMCON Technologies, its affiliates, or third parties. This notice serves as marking of its ?Confidential? status as defined in any confidentiality agreements concerning the sender and recipient. If you are not the intended recipient(s), or the employee or agent responsible for delivery of this message to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this e-mail message is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete this e-mail message from your computer.

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list