[GE issues] [Issue 3237] finishing a tightly integrated job takes too long

joga Joachim.Gabler at sun.com
Wed Feb 3 12:38:54 GMT 2010


User joga changed the following:

                What    |Old value                 |New value
             Assigned to|pollinger                 |joga
        Target milestone|---                       |6.2u6

------- Additional comments from joga at sunsource.net Wed Feb  3 04:38:52 -0800 2010 -------

The delay has been introduced in 6.2u2,
with the fix of IZ 2815: incomplete accounting for the last short tasks of tightly integrated parallel job.

This fix required an extended protocol between qmaster and execd:
When the master task of a tightly integrated job exits, 
all slave exec hosts are notified about the job end.
With the fix qmaster now waits for all slave execds to report end of all tasks 
and cleanup of the slave job container.

While the slave hosts are notified, cleanup and send the slave container final report, the master task job finish report is repeatedly sent
(in the load report interval). 
Once all slave hosts reported job finished, the following master task job report will finally trigger the job end.

Problem is that the final job end is delayed by the load report interval.


To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list