Opened 14 years ago

Last modified 9 years ago

#319 new defect

IZ1960: h_cpu not working for Tight Integrated jobs

Reported by: reuti Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0u7
Severity: Keywords: Linux qmaster
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1960]

        Issue #:      1960             Platform:     Other    Reporter: reuti (reuti)
       Component:     gridengine          OS:        Linux
     Subcomponent:    qmaster          Version:      6.0u7       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     h_cpu not working for Tight Integrated jobs
   Status whiteboard:
      Attachments:

     Issue 1960 blocks:
   Votes for issue 1960:


   Opened: Sat Jan 14 08:24:00 -0700 2006 
------------------------


Submitting an endless loop in MPICH which generates loads on the selected nodes:

$ qsub -l h_cpu=60 -pe mpich 2 test.sh

$ qstat -u reuti -g t
   7358 0.54333 test.sh  ... para@node39   MASTER
   7358 0.54333 test.sh  ... para@node41   SLAVE

On node41 hence:

20300 ?        S      0:00  \_ sge_shepherd-7358 -bg
20301 ?        Ss     0:00  |   \_ /usr/sge/utilbin/lx24-x86/rshd -l
20302 ?        S      0:00  |       \_ /usr/sge/utilbin/lx24-x86/qrsh_starter /var/spool/sge/node41/
active_jobs/7358.1/1.node41
20303 ?        R      0:22  |           \_ /home/reuti/mpihello node39 35362   4amslave -p4yourname
node41 -p4rmrank 1
20304 ?        S      0:00  |               \_ /home/reuti/mpihello node39 35362   4amslave -p4yourname
node41 -p4rmrank 1

fine, and after 60 seconds:

20300 ?        S      0:00  \_ sge_shepherd-7358 -bg
20301 ?        Ss     0:00  |   \_ /usr/sge/utilbin/lx24-x86/rshd -l
20302 ?        Z      0:00  |       \_ [qrsh_starter] <defunct>
20304 ?        S      0:00 /home/reuti/mpihello node39 35362   4amslave -p4yourname node41 -
p4rmrank 1

and nothing more changes. Only the process 20303 generated load, and now SGE still keeps this job, as
it never realises, that the h_cpu limit of the kernel-setrlimit was reached by one process. On the head
node of the parallel job,  the job script already exited and isn't any longer in the process tree. So the
desired behavior could be to kill all slave tasks, if the main script already finished.

In some way, this might be related to:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1681

   ------- Additional comments from reuti Sun Jan 15 04:37:16 -0700 2006 -------
On the head node of the parallel job, the PE stop script is executed as expected. This way any parallel lib
has a chance to shutdown any daemons in a proper way. After executing this stop script, SGE seems
waiting forever. In this stage all the qrsh remainings should be shut down on the slave nodes.

   ------- Additional comments from reuti Mon Jan 16 00:35:10 -0700 2006 -------
Maybe this can be extended to an enhancement: with a tight integrated parallel job which uses daemons
like LAM/MPI, a qdel will stop the qrsh processes *before* the PE stop script is executed. If this could be
delayed to happen *after* the PE stop stcript, then there would be a chance to shutdown the daemons in a
proper way and get rid of semaphores and shared memory segments.

If SGE is killing the qrsh processes under all circumstances after the PE stop script (if there are any left),
then the kill before the PE stop script could be an option to be defined in the PE setup
"kill_before_stop_proc_args TRUE/FALSE" if the current behavior is needed.

Change History (0)

Note: See TracTickets for help on using tickets.