Opened 13 years ago
Closed 10 years ago
#507 closed defect (fixed)
IZ2552: dump if SGE daemons crash when admin_user != "root"
Reported by: | andreas | Owned by: | |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | sge | Version: | 6.1AR_snapshot3_6 |
Severity: | minor | Keywords: | kernel |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2552]
Issue #: 2552 Platform: All Reporter: andreas (andreas) Component: gridengine OS: All Subcomponent: kernel Version: 6.1AR_snapshot3_6 CC: None defined Status: REOPENED Priority: P2 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: andreas (andreas) QA Contact: andreas URL: * Summary: No core dump if SGE daemons crash when admin_user != "root" Status whiteboard: Attachments: Date/filename: Description: Submitted by: Fri Apr 11 08:10:00 -0700 2008: libcore.so.gz libcore.so for AMD64 Linux (application/x-gzip) andreas Fri Apr 11 08:12:00 -0700 2008: libcore.c Source code for libcore.so (text/plain) andreas Mon Apr 28 04:00:00 -0700 2008: libcore.so.gz libcore.so for lx24-ia64 (application/x-gzip) andreas Mon Apr 28 04:01:00 -0700 2008: libcore.so.gz libcore.so for lx24-x86 (text/plain) andreas Mon Apr 28 06:49:00 -0700 2008: 2552.diff Proposed patch (maintrunk) (text/plain) andreas Tue May 13 02:23:00 -0700 2008: build.sh Build.sh that I used to build libcore.so from libcore.c attached earlier (text/plain) andreas Issue 2552 blocks: Votes for issue 2552: Opened: Thu Apr 10 02:51:00 -0700 2008 ------------------------ DESCRIPTION: When SGE daemons crash no core file gets written if admin_user != "root" due to security concerns. WORKAROUND/FIX: Under Solaris coreadm(1) can be used to give the kernel a waiver (per process/globally) so that core files get written in this case. Under Linux there are two means: (1) For overriding it for all processes there is a # sysctl -w kernel.core_setuid_ok=1 it is mentioned in http://kbase.redhat.com/faq/FAQ_49_3652.shtm for RHEL3 so I would assume it works in RHEL4 as well (2) For overriding it indivudually there is a call prctl(PR_SET_DUMPABLE,1,42,42,42); due to https://bugzilla.redhat.com/show_bug.cgi?id=104310 mentioning it as a bug when it is broke I would assume one can rely on it ------- Additional comments from andreas Thu Apr 10 05:00:34 -0700 2008 ------- Use of prctl(PR_SET_DUMPABLE,1,42,42,42) under Linux seems problematic as it were necessary to issue this prctl() anew each time uid/euid changes: http://linux-documentation.com/en/man/man2/prctl.html ------- Additional comments from andreas Thu Apr 10 05:38:01 -0700 2008 ------- Best approach to address this issue is to have the documentation explain how to still get the core file. Plan is to add a trouble shooting section to 6.2 Install Guide that refers coreadm(1M) and sysctl -w kernel.core_setuid_ok ------- Additional comments from andreas Fri Apr 11 08:07:50 -0700 2008 ------- As it turned out that e.g. RHEL4 does not know # sysctl -w kernel.core_setuid_ok=1 anymore the only resort to get a core dump under Linux appears to issue prctl(PR_SET_DUMPABLE,1,42,42,42); after each call to setuid(), seteuid(), setgid(), and setegid(). As workaround the use of libcore.so using LD_PRELOAD turned out to solve the issue. E.g. to apply it for sge_execd one must change in $SGE_ROOT/$SGE_CELL/common/sgeexecd the line $bin_dir/sge_execd where sge_execd is started into env LD_PRELOAD=/path/to/libcore.so $bin_dir/sge_execd after execd restart a nice core.<pid> file is written in the spool directory $SGE_ROOT/$SGE_CELL/spool/<host>/ of this execd when it gets killed using # kill -SEGV <pid> LD_PRELOAD though gets inherited to shepherds processes that are forked by such an execd, but the jobs themselfs will not have it in their environments, except if one was adding INHERIT_ENV=LD_PRELOAD to the execd_params section of the cluster configuration sge_conf(5). ------- Additional comments from andreas Fri Apr 11 08:10:12 -0700 2008 ------- Created an attachment (id=164) libcore.so for AMD64 Linux ------- Additional comments from andreas Fri Apr 11 08:12:04 -0700 2008 ------- Created an attachment (id=165) Source code for libcore.so ------- Additional comments from andreas Mon Apr 28 04:00:50 -0700 2008 ------- Created an attachment (id=166) libcore.so for lx24-ia64 ------- Additional comments from andreas Mon Apr 28 04:01:50 -0700 2008 ------- Created an attachment (id=167) libcore.so for lx24-x86 ------- Additional comments from andreas Mon Apr 28 06:49:52 -0700 2008 ------- Created an attachment (id=168) Proposed patch (maintrunk) ------- Additional comments from andreas Wed Apr 30 06:47:05 -0700 2008 ------- Fixed in Maintrunk for Linux sge_execds. ------- Additional comments from andreas Tue May 13 02:23:05 -0700 2008 ------- Created an attachment (id=171) Build.sh that I used to build libcore.so from libcore.c attached earlier
Change History (1)
comment:1 Changed 10 years ago by dlove
- Resolution set to fixed
- Severity set to minor
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.