[GE users] Re: [GE SGE-6.0u4: Job Suspend does not work for child processes.

Olesen, Mark Mark.Olesen at arvinmeritor.com
Thu Mar 8 12:28:09 GMT 2007


    [ The following text is in the "X-UNKNOWN" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

This might be somewhat related.
I am having problems trapping signals in the job script.

I am using 6.0u9 and openmpi-1.2rc1, but the same problem appeared with
openmpi-1.2b3

The job is submitted with
#$ -S /bin/sh -cwd -j y -notify

Within the script I define a simple function:

user_abort() {
   echo "(**) abort processes"

   touch ABORT

   # wait for the application to remove the ABORT file
   while :
   do
      if [ -f ABORT ]; then
         sleep 5
      else
         break
      fi
   done
}

And then trap the signals 
#
# with '-notify' we receive
#   STOP => USR1 (suspend)
#   KILL => USR2 (kill)
#
trap 'user_abort' USR1 USR2

Before finally starting

mpirun --mca mpi_yield_when_idle 1 APPLICATION ARGS


Interestingly enough, the signal seems to be making its way past the shell
trap and getting to the mpi daemon.

mpirun: Forwarding signal 12 to job[dealc14:18706] ERROR: A daemon on node
dealc13.DOMAIN failed to start as expected.
[dealc14:18706] ERROR: There may be more information available from
[dealc14:18706] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[dealc14:18706] ERROR: If the problem persists, please restart the
[dealc14:18706] ERROR: Grid Engine PE job
[dealc14:18706] The daemon received a signal 12.
[dealc14:18706] ERROR: A daemon on node dealc14.DOMAIN failed to start as
expected.
[dealc14:18706] ERROR: There may be more information available from
[dealc14:18706] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[dealc14:18706] ERROR: If the problem persists, please restart the
[dealc14:18706] ERROR: Grid Engine PE job
[dealc14:18706] The daemon received a signal 12.

Is there a syntax error, or the order of appearance, something else wrong,
or is this a feature?

/mark

This e-mail message and any attachments may contain legally privileged, confidential or proprietary Information, or information otherwise protected by law of ArvinMeritor, Inc., its affiliates, or third parties. This notice serves as marking of its ?Confidential? status as defined in any confidentiality agreements concerning the sender and recipient. If you are not the intended recipient(s), or the employee or agent responsible for delivery of this message to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this e-mail message is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete this e-mail message from your computer.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list