[GE users] qdel implements a SIGKILL signal ?

goncalo at lip.pt goncalo at lip.pt
Wed May 30 09:39:33 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Daniel and Reuti,

Thanks for the answers.
I'll try to implement your suggestions in order that "qdel" uses  
SIGTERM instead of SIGKILL. I'll report back...

Cheers
Goncalo



Quoting Reuti <reuti at staff.uni-marburg.de>:

> Am 29.05.2007 um 18:59 schrieb Daniel Templeton:
>
>> Goncalo,
>>
>> By default, Grid Engine uses SIGKILL.  The terminate_method queue   
>> attribute allows you to override that, though, with whatever signal  
>>  or script you want.  Children are found and killed via the   
>> additional group id assigned to the job.
>
> AFAIK this depends on a setting in the SGE configuration (add
> "ENABLE_ADDGRP_KILL" to "execd_params" - with possible side effects in
> Linux). The default is to kill the processgroup with something like:
>
> kill -9 -- -pid
>
> So the best option is to assure, that none of the started processes
> jumps out of the process tree.
>
> But anyway, I wouldn't start background tasks with & at all in SGE jobs
> (and at least csh seems to create new processgroups for each started
> background task).
>
> -- Reuti
>
>
>> Daniel
>>
>> goncalo at lip.pt wrote:
>>> Hi there,
>>>
>>> I have a very important doubt regarding SGE "qdel" command. To   
>>> clarify my doubt, I tried to run the following script from my   
>>> submission host:
>>>
>>> #---
>>>
>>> #!/bin/sh
>>>
>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>> "TERM"' SIGTERM
>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>> "INT"' SIGINT
>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>> "QUIT"' SIGQUIT
>>> trap 'fatal_error "Job has been terminated by the batch system"   
>>> "ABRT"' SIGABRT
>>>
>>> echo "OLA"
>>>
>>> fatal_error() {
>>>        echo "hi $1 $2"
>>> }
>>>
>>> sleep 2222222  &
>>> wait $!
>>>
>>> #---
>>>
>>> Then I've submitted this script to SGE, and after it starts   
>>> running, I deleted the job using "qdel job_id". The standard   
>>> output produced was:
>>>
>>> #---
>>>
>>> [lnlip01] ~ > cat test.sh.o38047
>>>
>>> #############
>>> # ATTENTION: Running PROLOG for test.sh on Tue May 29 16:12:38 WEST 2007
>>> # ATTENTION: Job test.sh, ID=38047 from user goncalo will be   
>>> executed in host lfcomp03.lip.pt
>>> #############
>>>
>>> OLA
>>>
>>> #############
>>> # ATTENTION: Running EPILOG for test.sh on Tue May 29 16:13:45 WEST 2007
>>> # ATTENTION: Job test.sh, ID=38047 from user goncalo ended on Tue   
>>> May 29 16:13:45 WEST 2007
>>> #############
>>>
>>> #---
>>>
>>> From this, I'm forced to conclude that qdel uses a SIGKILL signal   
>>> and not a SIGTERM. Is this right? If this is the case, what do we   
>>> do regarding children processes?
>>>
>>> Thanks in advance
>>> Best Regards
>>>
>>> Goncalo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list