[GE users] qdel implements a SIGKILL signal ?

Reuti reuti at staff.uni-marburg.de
Wed May 30 10:25:32 BST 2007


Goncalo,

Am 30.05.2007 um 10:39 schrieb goncalo at lip.pt:

> Hi Daniel and Reuti,
>
> Thanks for the answers.
> I'll try to implement your suggestions in order that "qdel" uses  
> SIGTERM instead of SIGKILL. I'll report back...

besides a SIGTERM, I would suggest to wait in addition 2 minutes or  
so in your terminate_method and then issue a SIGKILL as a last resort  
always.

Another (faster) option could be, to submit the jobs with -notify,  
and define SIGTERM as the warning in the SGE setting:

execd_params                 NOTIFY_KILL=SIGTERM

If this SIGTERM doesn't kill the job in the defined time (in the  
queue setting), then a SIGKILL will follow.

Cheers - Reuti


> Cheers
> Goncalo
>
>
>
> Quoting Reuti <reuti at staff.uni-marburg.de>:
>
>> Am 29.05.2007 um 18:59 schrieb Daniel Templeton:
>>
>>> Goncalo,
>>>
>>> By default, Grid Engine uses SIGKILL.  The terminate_method  
>>> queue  attribute allows you to override that, though, with  
>>> whatever signal  or script you want.  Children are found and  
>>> killed via the  additional group id assigned to the job.
>>
>> AFAIK this depends on a setting in the SGE configuration (add
>> "ENABLE_ADDGRP_KILL" to "execd_params" - with possible side  
>> effects in
>> Linux). The default is to kill the processgroup with something like:
>>
>> kill -9 -- -pid
>>
>> So the best option is to assure, that none of the started processes
>> jumps out of the process tree.
>>
>> But anyway, I wouldn't start background tasks with & at all in SGE  
>> jobs
>> (and at least csh seems to create new processgroups for each started
>> background task).
>>
>> -- Reuti
>>
>>
>>> Daniel
>>>
>>> goncalo at lip.pt wrote:
>>>> Hi there,
>>>>
>>>> I have a very important doubt regarding SGE "qdel" command. To   
>>>> clarify my doubt, I tried to run the following script from my   
>>>> submission host:
>>>>
>>>> #---
>>>>
>>>> #!/bin/sh
>>>>
>>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>>> "TERM"' SIGTERM
>>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>>> "INT"' SIGINT
>>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>>> "QUIT"' SIGQUIT
>>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>>> "ABRT"' SIGABRT
>>>>
>>>> echo "OLA"
>>>>
>>>> fatal_error() {
>>>>        echo "hi $1 $2"
>>>> }
>>>>
>>>> sleep 2222222  &
>>>> wait $!
>>>>
>>>> #---
>>>>
>>>> Then I've submitted this script to SGE, and after it starts   
>>>> running, I deleted the job using "qdel job_id". The standard   
>>>> output produced was:
>>>>
>>>> #---
>>>>
>>>> [lnlip01] ~ > cat test.sh.o38047
>>>>
>>>> #############
>>>> # ATTENTION: Running PROLOG for test.sh on Tue May 29 16:12:38  
>>>> WEST 2007
>>>> # ATTENTION: Job test.sh, ID=38047 from user goncalo will be   
>>>> executed in host lfcomp03.lip.pt
>>>> #############
>>>>
>>>> OLA
>>>>
>>>> #############
>>>> # ATTENTION: Running EPILOG for test.sh on Tue May 29 16:13:45  
>>>> WEST 2007
>>>> # ATTENTION: Job test.sh, ID=38047 from user goncalo ended on  
>>>> Tue  May 29 16:13:45 WEST 2007
>>>> #############
>>>>
>>>> #---
>>>>
>>>> From this, I'm forced to conclude that qdel uses a SIGKILL  
>>>> signal  and not a SIGTERM. Is this right? If this is the case,  
>>>> what do we  do regarding children processes?
>>>>
>>>> Thanks in advance
>>>> Best Regards
>>>>
>>>> Goncalo
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list