> Perhaps if flicserver always ran in step with scheduling runs (ie
> right before) then this race condition wouldn't exist? I hope my
> description of the problem is a little clearer now? Thanks for your
> feedback!

Try the qlicserver '-l' option in your prolog.
I think there is an example in the package and/or an explanation in the
wiki. If not, here is the quick prolog:

# prolog
# $Id: prolog,v 1.7 2006/05/22 08:48:33 cfdadmin Exp $
echo "start ($JOB_ID)" `date -Is 2>/dev/null`

# <environ>
# --------------------------------------------------------------
: ${SGE_ROOT:=/opt/sge}
: ${SGE_CELL:=default}
for i in $SGE_ROOT/$SGE_CELL/site/environ; do [ -f $i ] && . $i; done
# --------------------------------------------------------------
# </environ>

# we got this far, we can drop the -hold_jid
$SGE_BINARY_PATH/qalter -hold_jid 0 $JOB_ID > /dev/null 2>&1

# (hard) requested resources
rclist=`$SGE_BINARY_PATH/qstat -r -j $JOB_ID | \
   sed -ne 's/^.*hard *resource_list: *//p'`

# <resource_check>
# -------------------------------------------------------------
# verify that the expected resources actually exist
# this should prevent the race condition that occurs between SGE jobs
# before the load report (available licenses) gets updated

# NB: each exec_host must also be a submit host for this to work

if [ -n "$rclist" -a -x "$query" ]; then
   echo "query resources   $rclist,slots=$NSLOTS,JOB_ID=$JOB_ID"
   available=`$query -l $rclist,slots=$NSLOTS,JOB_ID=$JOB_ID`

   case "$exitcode" in
     0 ) ;;  # okay
     99 )
      echo "re-queue job      $available"
      echo "-------------------------"
      exit 99
      echo "error with license query $exitcode"
      exit $exitcode
# -----------------------------------------------------------------
# </resource_check>

exit 0  # always report success ?

Something similar might be possible in the epilog to keep the license count
straight, but I haven't had that problem before.


