[GE users] qdel implements a SIGKILL signal ?

Reuti reuti at staff.uni-marburg.de
Tue May 29 19:02:25 BST 2007


Am 29.05.2007 um 18:59 schrieb Daniel Templeton:

> Goncalo,
>
> By default, Grid Engine uses SIGKILL.  The terminate_method queue  
> attribute allows you to override that, though, with whatever signal  
> or script you want.  Children are found and killed via the  
> additional group id assigned to the job.

AFAIK this depends on a setting in the SGE configuration (add  
"ENABLE_ADDGRP_KILL" to "execd_params" - with possible side effects  
in Linux). The default is to kill the processgroup with something like:

kill -9 -- -pid

So the best option is to assure, that none of the started processes  
jumps out of the process tree.

But anyway, I wouldn't start background tasks with & at all in SGE  
jobs (and at least csh seems to create new processgroups for each  
started background task).

-- Reuti


> Daniel
>
> goncalo at lip.pt wrote:
>> Hi there,
>>
>> I have a very important doubt regarding SGE "qdel" command. To  
>> clarify my doubt, I tried to run the following script from my  
>> submission host:
>>
>> #---
>>
>> #!/bin/sh
>>
>> trap 'fatal_error "Job has been terminated by the batch system"  
>> "TERM"' SIGTERM
>> trap 'fatal_error "Job has been terminated by the batch system"  
>> "INT"' SIGINT
>> trap 'fatal_error "Job has been terminated by the batch system"  
>> "QUIT"' SIGQUIT
>> trap 'fatal_error "Job has been terminated by the batch system"  
>> "ABRT"' SIGABRT
>>
>> echo "OLA"
>>
>> fatal_error() {
>>         echo "hi $1 $2"
>> }
>>
>> sleep 2222222  &
>> wait $!
>>
>> #---
>>
>> Then I've submitted this script to SGE, and after it starts  
>> running, I deleted the job using "qdel job_id". The standard  
>> output produced was:
>>
>> #---
>>
>> [lnlip01] ~ > cat test.sh.o38047
>>
>> #############
>> # ATTENTION: Running PROLOG for test.sh on Tue May 29 16:12:38  
>> WEST 2007
>> # ATTENTION: Job test.sh, ID=38047 from user goncalo will be  
>> executed in host lfcomp03.lip.pt
>> #############
>>
>> OLA
>>
>> #############
>> # ATTENTION: Running EPILOG for test.sh on Tue May 29 16:13:45  
>> WEST 2007
>> # ATTENTION: Job test.sh, ID=38047 from user goncalo ended on Tue  
>> May 29 16:13:45 WEST 2007
>> #############
>>
>> #---
>>
>> From this, I'm forced to conclude that qdel uses a SIGKILL signal  
>> and not a SIGTERM. Is this right? If this is the case, what do we  
>> do regarding children processes?
>>
>> Thanks in advance
>> Best Regards
>>
>> Goncalo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list