[GE users] sge_execd exits badly even killed cleanly

reuti reuti at staff.uni-marburg.de
Mon Jan 25 14:15:03 GMT 2010


Am 25.01.2010 um 11:09 schrieb massot:

> On Fri, Jan 22, 2010 at 07:46:36PM +0100, reuti wrote:
>> Am 22.01.2010 um 17:54 schrieb massot:
>>> I have a problem with sge_execd's behavior when it get stopped. When
>>> killed with TERM signal it's supposed to end a clean way, but  
>>> it's not
>>> the case on my execution hosts.
>>> When I reboot or shut down a computer (hence send a TERM signal to
>>> sge_execd),
>> when you reboot ot shutdown the machine it's too late, as already all
>> processes got the signals TERM/KILL (including network and alike -
>> how do you transfer the checkpoint file to shared space?).
> In my case it's not too late if job doesn't have too much data to  
> save.
> On Debian relevent shutdown steps occure in the following  
> chronological
> order :
> * send sge_execd the TERM signal ;
> * send remaining processes the TERM signal (jobs run by SGE are in  
> this
>   category);
> * wait for about 10 seconds ;
> * send remaining processes the KILL signal ;
> * unmount NFS ;
> * shutdown network.
> So if your job can save its data within 10 seconds, it's ok.
> I thought sge_execd would send the USR1 signal to jobs when it  
> receives
> TERM signal, and then jobs would block TERM signal, save their data  
> and
> stop cleanly.

no - at least it's not to my experience. The usr1 is send only for  
the timed creation of the checkpoint files. There is no checkpint  
generated just before the term/kill. This is only possible with the  
application-level interface, but as said, even then you have to  
trigger the creation on your own.


> Actually even if it was working the expected way, I'd prefer not  
> use the
> when=s flag in my checkpointing environment because it wouldn't let  
> job
> differentiate between periodic checkpoints and computer shutdown  
> (since
> SGE always sends the same signal). Users will be able to decide  
> whether
> they want to save data or not, based on received signal.

Then the application-level interface will provide more options. Did  
you check the Howto:


Inside the checkpoint.sh you will need to send something like `kill - 
usr1 -- -$job_pid` to send the signal to the complete process group.  
The available variables are:

char *ckpt_variables[] = {

Did you find this anywhere documented? It woud be another issue to  
have these noted at a proper location. Some are mentioned in the  
README.* in the $SGE_ROOT/ckpt though.

>> But there  is an issue anyway (last entry, I just checked in 6.2u5
>> again):
>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2045
> I agree with your report. Moreover e-mail sent by sge_execd too is  
> sent
> when sge_execd get restarted, which can be a very long time after it
> stopped.
> It will be really awkward to receive an e-mail saying that your  
> process
> failed, a long time after it actually succeeded.

Yes, this is confusing. The email can arrive after your job finished  
already successful some time ago.

>> When you use rerun and reschedule_unknown already, do you have any
>> need for the setup of a checkpointing environment?
> I think rerun and reschedule_unknown are not used if your job doesn't
> use a checkpointing environment with when=r flag, are they?
> Anyway I need a checkpointing environment to have periodic backups.  
> It's
> easier than having programs using alarm() and SIGALRM.

Okay, I see.

To summarize: to generate different signals (and even a checkpoint  
before the shutdown of the execd), I think the only working option is  
to use the application-level interface and suspend the queue on this  
machine before it is shut down. This way you have a chance that the  
migrate script is called for sure, where you can send a different  
signal to your application than it is send from checkpointing script.  
The migrate script is not called at all for the shutdown of the execd  

-- Reuti


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list