Fw: [GE users] SGE questions

john.li at mindspeed.com john.li at mindspeed.com
Thu Mar 1 18:24:42 GMT 2007


Actually, I used qconf -mconf to add the statement below,

execd_params                 NOTIFY_KILL=INT,USE_QSUB_GID=true

This is how I know of sending a INT to a job.   I wish I could say,

execd_params                 NOTIFY_KILL=(INT TERM)   USE_QSUB_GID=true

so that SGE will first send INT, then send TERM to complete killing a job.

For your suggestion, 

b) use a custom terminate_method, which will send a signal of your 
choice, wait some seconds and send the sigkill (both to the 
processgroup i.e.: kill -2 -- -$job_pid)

Would you provide me some detail as where and how to implement this?

Thanks very much...






Reuti <reuti at staff.uni-marburg.de> 
03/01/07 10:03 AM
Please respond to
users at gridengine.sunsource.net


To
users at gridengine.sunsource.net
cc

Subject
Re: [GE users] SGE questions






Hi,

Am 01.03.2007 um 18:23 schrieb john.li at mindspeed.com:

>
> Understood.   I actually tried the second rqs yesterday afternoon. 
> It works.
>
> This rqs is great.   It is a huge enhencement for me, a LSF user 
> for many years.
>
> One other wish,  SGE kills a job too harshly.   I'm aware that I 
> can send different
> kill signal by modifying execd_params.  Still I'm unable to perform 
> what LSF

you mean the NOTIFY_KILL/SUSP - these are the warnings used by the - 
notify option - not the final kill.

> does when killing a job.
>
>        By default, sends a set of signals to kill the specified 
> jobs. On UNIX,
>        SIGINT and SIGTERM are sent to give the job a chance to 
> clean up before
>        termination,  then  SIGKILL  is sent to kill the job. The 
> time interval
>        between sending each signal is defined  by  the 
> JOB_TERMINATE_INTERVAL.

You can:

a) use the -notify option to qsub and define a signal of your choice 
in SGE's configuration, maybe in a default sge_request always -notify

b) use a custom terminate_method, which will send a signal of your 
choice, wait some seconds and send the sigkill (both to the 
processgroup i.e.: kill -2 -- -$job_pid)

-- Reuti


> We could recover some usable simulation information from a killed job
> by LSF, which first sends a INT to allow job to clean up.   I think 
> SGE kills a job
> with only sending one kill signal.
>
> Thanks,
>
>
>
>
>
>
>
>
> Reuti <reuti at staff.uni-marburg.de>
> 03/01/07 04:09 AM
> Please respond to
> users at gridengine.sunsource.net
>
>
> To
> users at gridengine.sunsource.net
> cc
> Subject
> Re: [GE users] SGE questions
>
>
>
>
>
> Hi,
>
> Am 28.02.2007 um 19:50 schrieb john.li at mindspeed.com:
>
> >
> > Sorry, everyone.   My resource quota set should be
> >
> > {
> >   name         testq
> >   description  "QUEUE testq resource quotas"
> >   enabled      TRUE
> >   limit        queues {testq} to slots=16
> >   limit        users {*} queues {testq} to slots=8
> > }
> >
> > I'm really sorry about this...
>
> the first rule grants access and hence ends the test of further rules
> in this set. But you can define a second resource quota set
> (containing only the second of your rules) and then you should get
> the desired behavior
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net





More information about the gridengine-users mailing list