#273 new defect

IZ1790: shepherd does not wait for terminate_method to complete

Attempted to implement a user-specified signal on job termination:

We took the following script:

        iad2mgt02% more

        if [ "$SG_NOTIFY_SIGNAL" != "" ] ; then
          kill -$SG_NOTIFY_SIGNAL -$1
          if [ "$SG_NOTIFY_SLEEP_TIME" != "" ] ; then
             if [ $SG_NOTIFY_SLEEP_TIME -gt 0 ] ; then
                sleep $SG_NOTIFY_SLEEP_TIME
                kill -9 -$1
                sleep 10
             kill -9 -$1
             sleep 10
             kill -9 -$1
          kill -9 -$1

Then, we did a qconf -mq all.q and set
        terminate_method=/home/sgeadmin/n1ge60/ $job_pid

Unfortunately, when we qdel a job, the job immediately gets reported by
Grid Engine as completed, and our clean-up process could start at any time,
with the result that the job output might be packed into a tarball and sent
to the user well before $SG_NOTIFY_SLEEP_TIME has passed, and therefore
before any signal-triggered activity has completed.

Is there a way to get Grid Engine to leave the process in "dr" state until
the terminate_method script has completed? This should be handled in the
same way as migration or checkpoint methods.

