[GE users] Pb w/ suspending job ...

TRAN Chanh chanh.tran at dassault-aviation.fr
Fri May 20 17:47:11 BST 2005


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

thanks a lot Dan, I'll give it a try ASAP

Chanh

Dan Gruhn wrote:

> Chanh,
>
> You can't change the default action, but you can change the handling 
> of the signals in your program or script.
>
> For a C/C++ program, you would use the signal system call.  Read the 
> manual page (man signal).  The basic description is:
>
>        #include <signal.h>
>
>        typedef void (*sighandler_t)(int);
>
>        sighandler_t signal(int signum, sighandler_t handler);
>
> You can ignore the USR1 signal with the call:
> signal(SIGUSR1, SIG_IGN);
>
> If you have a shell script, use kill -l to list the signal names/numbers:
>
> Bsh> kill -l
> 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
> 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
> 9) SIGKILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
> 13) SIGPIPE     14) SIGALRM     15) SIGTERM     17) SIGCHLD
> 18) SIGCONT     19) SIGSTOP     20) SIGTSTP     21) SIGTTIN
> 22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
> 26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO
> 30) SIGPWR      31) SIGSYS      33) SIGRTMIN    34) SIGRTMIN+1
> 35) SIGRTMIN+2  36) SIGRTMIN+3  37) SIGRTMIN+4  38) SIGRTMIN+5
> 39) SIGRTMIN+6  40) SIGRTMIN+7  41) SIGRTMIN+8  42) SIGRTMIN+9
> 43) SIGRTMIN+10 44) SIGRTMIN+11 45) SIGRTMIN+12 46) SIGRTMIN+13
> 47) SIGRTMIN+14 48) SIGRTMIN+15 49) SIGRTMAX-15 50) SIGRTMAX-14
> 51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
> 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
> 59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
> 63) SIGRTMAX-1  64) SIGRTMAX
>
>
> then you can put in a trap function to ignore USR1 and USR2
>
> trap "" USR1 USR2
>
> or, if your system doesn't support signal names in the trap call:
>
> trap "" 10 12
>
> The signal numbers should be the same, but you should double check.
>
> Dan
>
> On Fri, 2005-05-20 at 11:35, TRAN Chanh wrote:
>
>>/Reuti wrote:
>>
>>> If -notify is set, the jobs are warned with a SIGUSR1 before SIGSTOP. 
>>> The default action of SIGUSR1 is to terminate the process, unless you 
>>> trap the signal to do some custom procedures on your own (and this 
>>> means to trap it in each program in the process group).
>>
>>How do U change the default action of all these SIGUSR ?
>>
>>>
>>> For parallel jobs, the parallel lib may face some timeouts. But just try.
>>
>>I  've  successfully  suspended  my  'multi-proc' jobs but haven't the 
>>chance to try w/ 'multi-node' jobs ....
>>
>>>
>>>
>>> CU - Reuti
>>>
>>>
>>> TRAN Chanh wrote:
>>>
>>>>
>>>>
>>>> Reuti wrote:
>>>>
>>>>> Were these just plain serial jobs? There is indeed the possibility 
>>>>> to change the suspend/resume method, but the built-in:
>>>>
>>>>
>>>>
>>>> Currently, all the jobs 're plain serail one.
>>>> BTW, I just discovered that I have '-notify'  option in my 'qsub' & 
>>>> by eliminating this now my 'suspend' pb is gone.
>>>> I must say I'm happy w/ this but nevertheless remain interested in 
>>>> having an explanation why I did have this effect ...
>>>>
>>>> Actually, next step for me is to suspend 'multi-proc' jobs & 
>>>> 'multi-node' jobs & hope everything 'll work out fine
>>>>
>>>> Thanks again,
>>>> Chanh
>>>>
>>>>>
>>>>> kill -stop -- -<pid>
>>>>>
>>>>> should stop the whole process group. Did you define any procedures 
>>>>> on your own? Are some forks/threads of your application jumping out 
>>>>> of the process group? - Reuti
>>>>>
>>>> Otherwise, I don't have any specific procedure of my own ...
>>>>
>>>>>
>>>>> TRAN Chanh wrote:
>>>>>
>>>>>> Hi Reuti,
>>>>>>
>>>>>> Actually, I did try to do this :
>>>>>> 1. via 'qmon->jobs->suspend ....'
>>>>>> 2. qmod -s job_id
>>>>>>
>>>>>> Both 2 bring the same result
>>>>>>
>>>>>> Chanh
>>>>>>
>>>>>> Reuti wrote:
>>>>>>
>>>>>>> Chanh,
>>>>>>>
>>>>>>> which SGE commands did you use in detail to suspend and unsuspend 
>>>>>>> your jobs? - Reuti
>>>>>>>
>>>>>>> TRAN Chanh wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm using SGE 5.3p6 & try to have my executing jobs suspended & 
>>>>>>>> have these one 'back-to-work' via 'resume'.
>>>>>>>> What happens is these jobs instead of being suspended like 'kill 
>>>>>>>> -SIGSTOP', they 're all aborted like 'kill -9'.
>>>>>>>> Is there anyway to change this behavior ?
>>>>>>>>
>>>>>>>> Thanks a lot for any help,
>>>>>>>> Chanh
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>/
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list