[GE users] qmaster and tightly integrated tasks

Thomas Neumann neumann at exasol.com
Fri Jan 20 15:13:27 GMT 2006

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello !

Analysing a problem with one of my jobscripts I came accross the 
following behaviour of the qmaster when running tightly integrated tasks:

If a tightly integrated task fails, the qmaster tells in the messagefile 
something like
'tightly integrated parallel task 20416.0 task 988.cmp1 fails - killing job'
Some seconds later the job is killed.

In some cases this behaviour is a problem for my scripts: For example 
I've got several processes started by the jobscript having a predefined 
timeout. Running in a timeout all subtasks are killed with SIGKILL or 
SIGTERM, but the script has a defined behaviour to react on the timeout 
condition and continuing execution (e.g. cleanup-routines for shared 
memory or other tasks to execute). Unfortunately the qmaster registers 
the killed subtasks and terminates the job before the job can complete 

Is there a way to prevent the qmaster from killing a job due to tightly 
integrated subtask failures ?


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list