[GE users] Suspending jobs submitted with -notify
Dan.Gruhn at Group-W-Inc.com
Fri Jan 28 13:18:48 GMT 2005
I've been wrestling with qsub -notify for a week or more myself. Please
take a look at issue #1440 to see what I have observed as problems.
What I have seen about USR1 is to do the following in my job script:
trap "" USR1
This tells the system to just ignore the USR1 signal. I found that
trying to handle it, even to output a status message, was a problem if I
wasn't planning to exit after handling it. In my case, my script1 had
usually called another script2 and if USR1 came in and I tried to output
a message, it would look to my script1 as if script2 had returned with
an exit status greater than 128.
Are you using a script for your job?
Have you tried just ignoring USR1?
What have you set your notify time to on your job queues?
Note that USR2 is very helpful as it lets your job know that it is about
to be killed and you can do some cleanup before that. It has some
problems, as I have noted in Issue 1440.
On Fri, 2005-01-28 at 05:05, Olle Liljenzin wrote:
> I have problems with jobs submitted with 'qsub -notify' getting killed
> when suspending them with 'qmod -sj'.
> I have set up a trap for all signals that can be caught. When the job is
> suspended it reports that SIGUSR1 was caught, but in the next moment the
> process it is just gone. The only reason I can see for that it
> dissappers would be that it directly after SIGUSR1 gets a second signal
> that kills the process.
> Suspending a job that was submitted with 'qrsh -notify' work as
> expected. The process gets a SIGUSR1 and after a while it falls into sleep.
> Is it something I should change in the configuration or is it just that
> I don't understand how it is supposed to work?
> I'm running version 6.0u3 and I have tried it on Linux, Solaris, AIX and
> HP-UX with the same result.
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users