[GE users] SGE Rescheduling

Reuti reuti at staff.uni-marburg.de
Thu Aug 3 12:33:17 BST 2006


Hi,

Am 02.08.2006 um 23:14 schrieb Sreenath Nampally:

> Hello,
>
> Could someone explain the sequence of events that happen in SGE  
> (both on qmaster and exec host)
> when a job is rescheduled  and suspended? What signals are sent to  
> the job ?

if the job gets supended, it will get a SIGSTOP which you can't  
catch. But you could submit the job with -notify, to get a warning  
before, which you can catch. Have a look at `man qsub`, and you could  
even redefine the signal: `man sge_conf`section execd_params. But be  
aware, that the signal will be send to the whole process group, and  
this might need proper handling in the jobscript and the compiled  
program.

If you reschedule a job, it will be killed, and also before this you  
could get a warning by -notify. But I think, you will only get the  
information about the kill, but not the reason that it will be  
rescheduled. Only during the next run, you can test the variable  
RESTARTED, whether it's 1. If you need a more sophisticated handling,  
you can also try to use the checkpointing interface.

HTH - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list