[GE users] Pb w/ suspending job ...

Dan Gruhn Dan.Gruhn at Group-W-Inc.com
Fri May 20 17:20:59 BST 2005


Chanh,

You can't change the default action, but you can change the handling of
the signals in your program or script.

For a C/C++ program, you would use the signal system call.  Read the
manual page (man signal).  The basic description is:

       #include <signal.h>

       typedef void (*sighandler_t)(int);

       sighandler_t signal(int signum, sighandler_t handler);

You can ignore the USR1 signal with the call:
signal(SIGUSR1, SIG_IGN);

If you have a shell script, use kill -l to list the signal
names/numbers:

Bsh> kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIGKILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE     14) SIGALRM     15) SIGTERM     17) SIGCHLD
18) SIGCONT     19) SIGSTOP     20) SIGTSTP     21) SIGTTIN
22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO
30) SIGPWR      31) SIGSYS      33) SIGRTMIN    34) SIGRTMIN+1
35) SIGRTMIN+2  36) SIGRTMIN+3  37) SIGRTMIN+4  38) SIGRTMIN+5
39) SIGRTMIN+6  40) SIGRTMIN+7  41) SIGRTMIN+8  42) SIGRTMIN+9
43) SIGRTMIN+10 44) SIGRTMIN+11 45) SIGRTMIN+12 46) SIGRTMIN+13
47) SIGRTMIN+14 48) SIGRTMIN+15 49) SIGRTMAX-15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX


then you can put in a trap function to ignore USR1 and USR2

trap "" USR1 USR2

or, if your system doesn't support signal names in the trap call:

trap "" 10 12

The signal numbers should be the same, but you should double check.

Dan

On Fri, 2005-05-20 at 11:35, TRAN Chanh wrote:

> Reuti wrote:
> 
> > If -notify is set, the jobs are warned with a SIGUSR1 before SIGSTOP. 
> > The default action of SIGUSR1 is to terminate the process, unless you 
> > trap the signal to do some custom procedures on your own (and this 
> > means to trap it in each program in the process group).
> 
> How do U change the default action of all these SIGUSR ?
> 
> >
> > For parallel jobs, the parallel lib may face some timeouts. But just try.
> 
> I  've  successfully  suspended  my  'multi-proc' jobs but haven't the 
> chance to try w/ 'multi-node' jobs ....
> 
> >
> >
> > CU - Reuti
> >
> >
> > TRAN Chanh wrote:
> >
> >>
> >>
> >> Reuti wrote:
> >>
> >>> Were these just plain serial jobs? There is indeed the possibility 
> >>> to change the suspend/resume method, but the built-in:
> >>
> >>
> >>
> >> Currently, all the jobs 're plain serail one.
> >> BTW, I just discovered that I have '-notify'  option in my 'qsub' & 
> >> by eliminating this now my 'suspend' pb is gone.
> >> I must say I'm happy w/ this but nevertheless remain interested in 
> >> having an explanation why I did have this effect ...
> >>
> >> Actually, next step for me is to suspend 'multi-proc' jobs & 
> >> 'multi-node' jobs & hope everything 'll work out fine
> >>
> >> Thanks again,
> >> Chanh
> >>
> >>>
> >>> kill -stop -- -<pid>
> >>>
> >>> should stop the whole process group. Did you define any procedures 
> >>> on your own? Are some forks/threads of your application jumping out 
> >>> of the process group? - Reuti
> >>>
> >> Otherwise, I don't have any specific procedure of my own ...
> >>
> >>>
> >>> TRAN Chanh wrote:
> >>>
> >>>> Hi Reuti,
> >>>>
> >>>> Actually, I did try to do this :
> >>>> 1. via 'qmon->jobs->suspend ....'
> >>>> 2. qmod -s job_id
> >>>>
> >>>> Both 2 bring the same result
> >>>>
> >>>> Chanh
> >>>>
> >>>> Reuti wrote:
> >>>>
> >>>>> Chanh,
> >>>>>
> >>>>> which SGE commands did you use in detail to suspend and unsuspend 
> >>>>> your jobs? - Reuti
> >>>>>
> >>>>> TRAN Chanh wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm using SGE 5.3p6 & try to have my executing jobs suspended & 
> >>>>>> have these one 'back-to-work' via 'resume'.
> >>>>>> What happens is these jobs instead of being suspended like 'kill 
> >>>>>> -SIGSTOP', they 're all aborted like 'kill -9'.
> >>>>>> Is there anyway to change this behavior ?
> >>>>>>
> >>>>>> Thanks a lot for any help,
> >>>>>> Chanh
> >>>>>>
> >>>>>> --------------------------------------------------------------------- 
> >>>>>>
> >>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



More information about the gridengine-users mailing list