Opened 2 months ago

#1607 new defect

Do not ignore SIGCHLD

Reported by: opoplawski Owned by:
Priority: normal Milestone:
Component: sge Version: 8.1.9
Severity: minor Keywords:
Cc:

Description

While testing out the credential handling by sge_qmaster, I found this:

04/06/2017 11:28:37|worker|vulcan7|E|could not store credentials for job 15 - command "/usr/share/gridengine/utilbin/lx-amd64/put_cred" failed with return code 10

This because sge_qmaster is ignoring SIGCHLD and setting SA_NOCLDWAIT, and thus waitpid() is returning with errno 10 - ECHILD because the child has already exited and we said we didn't care.

This appears to date back quite a ways:

commit fd6c976608cbde90d95cfb6a04eaee793a60ce68
Author: adoerr <adoerr>
Date:   Wed Nov 3 10:53:39 2004 +0000

    *** empty log message ***

diff --git a/Changelog b/Changelog
index 482d358..57c8ee0 100644
--- a/Changelog
+++ b/Changelog
@@ -1,3 +1,9 @@
+AD-2004-11-03-0: Bugfix:    '-m a' qsub option did leave a zombie process
+                 Review:    EB
+                 Changed:   qmaster
+                 Issue:     1277
+                 Bugtraq:   5104789
+

but this completely breaks sge_peopen()/sge_peclose functionality. Perhaps some mailing code will need to add the necessary waitpid() call.

Attachments (1)

sge-sigchld.patch (1.0 KB) - added by opoplawski 2 months ago.
Patch to not ignore SIGCHLD

Download all attachments as: .zip

Change History (1)

Changed 2 months ago by opoplawski

Patch to not ignore SIGCHLD

Note: See TracTickets for help on using tickets.