Opened 14 years ago

Last modified 9 years ago

#322 new defect

IZ1964: qdel failes to delete processes that spawn a new process group with interactive jobs

Reported by: andreas Owned by:
Priority: normal Milestone:
Component: sge Version: 5.3
Severity: Keywords: execution
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1964]

        Issue #:      1964             Platform:     All      Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    execution        Version:      5.3         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    pollinger (pollinger)
      QA Contact:     pollinger
          URL:
       * Summary:     qdel failes to delete processes that spawn a new process group with interactive jobs
   Status whiteboard:
      Attachments:

     Issue 1964 blocks:
   Votes for issue 1964:


   Opened: Thu Jan 19 03:59:00 -0700 2006 
------------------------


DESCRIPTION:
With interactive jobs/tasks started via qrsh/qsh/qlogin Grid Engine lacks a
means to terminate processes which spawned a new process group. A qdel though
finishes the job seemingly, but some of the processes remain running:

(1) Use qsh to start an xterm(1) under control of Grid Engine
(2) Run Grid Engine 'work' example job binary
    # work -change_pgrp -t 3600
(3) Use qdel to get rid of the job

---> the work process remains running and continues to utilize CPU

SUGGESTED FIX:
Based on addtl group id job process tracking the problem would require
rshd/telnetd/rlogind/xterm/sshd binaries be patched specialy for N1GE to
ensure addtl group id gets orderly set. Though a patched
rshd is already part of N1GE distribution, but it is not realistic to
do the same with telnetd/rlogind/xterm/sshd.

Thus the ideal solution appears to change process tracking mechanism
in a way that the process tree structre of a job is utilized rather than
addtl group id.

WORKAROUND:
Workaround is one of
(1) prevent a new process group be spawned by the job
(2) use a pstree based job termination method

   ------- Additional comments from andreas Thu Jan 19 04:01:54 -0700 2006 -------
There was a related issue 1519. Though a fix for it was delivered, but
in case a new pgrp is spawned a pstree based terminate method still is required.

Change History (0)

Note: See TracTickets for help on using tickets.