Opened 17 years ago

Last modified 6 years ago

#62 new defect

IZ336: h_vmem and shared memory segments

Reported by: lori Owned by:
Priority: low Milestone:
Component: sge Version: 5.3p1
Severity: minor Keywords: Sun SunOS execution
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=336]

        Issue #:      336              Platform:     Sun      Reporter: lori (lori)
       Component:     gridengine          OS:        SunOS
     Subcomponent:    execution        Version:      5.3p1       CC:
                                                                        [_] reuti
                                                                        [_] Remove selected CCs
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    pollinger (pollinger)
      QA Contact:     pollinger
          URL:
       * Summary:     h_vmem and shared memory segments
   Status whiteboard:
      Attachments:

     Issue 336 blocks:
   Votes for issue 336:


   Opened: Fri Jul 26 03:34:00 -0700 2002 
------------------------


Hello,

it seems, that the actual memory usage of a job
does not work right with parallel jobs which use
shared memory segments. One example is the
chemical software GAUSSIAN. In parallel mode
Gaussian starts
n processes which use the same shared memory
segment. If I submit such a job which use 8 CPUs
and in all 2 GB of memory, I must request 8 CPUs
and 16 GB of memory. Otherwise the job is killed
with the message
job 35304 exceeds job hard limit "h_vmem" of queue
"sunc19.q" in the error file.

Andrea

   ------- Additional comments from lori Thu Aug 8 02:35:50 -0700 2002 -------
Gaussian use shmat to allocate the memory.

   ------- Additional comments from andreas Thu Aug 8 05:01:15 -0700 2002 -------
1. What amount of memory (vmem) is reported for such jobs by

   # qstat -j <jobid>

2. What is the *total* memory amount (SIZE) reported by 'top' utility
   for all(!) processes belonging to such a job.

3. What is the *total* memory amount (SZ) reported by 'ps -edalf'
   for all(!) processes belonging to such a job.


   ------- Additional comments from lori Mon Aug 12 00:53:05 -0700 2002 -------
1.)usage    1:                  cpu=00:04:14, mem=261.07035 GBs,
io=0.00000, vmem=4.29G, maxvmem=4.29G

2)
PID   USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
15131 ae106lo    1   0    0 1088M   54M cpu/20   3:45  4.04% l502.exe
16495 ae106lo    1   0    0 1088M   10M cpu/13   0:37  3.49% l502.exe
16493 ae106lo    1   0    0 1088M   10M cpu/19   0:36  3.42% l502.exe
16494 ae106lo    1   0    0 1088M   10M cpu/0    0:36  3.36% l502.exe
16579 ae106lo    1  58    0 2928K 2248K cpu/23   0:01  0.11% top
16498 ae106lo    1  40    0  952K  784K sleep    0:00  0.00% yaksh
16501 ae106lo    1  48    0 2016K 1584K sleep    0:00  0.00% ksh
14860 ae106lo    1   0    0 2024K 1592K sleep    0:00  0.00% ksh
15066 ae106lo    1   0    0  952K  600K sleep    0:00  0.00% timex
14857 ae106lo    1  22    0  952K  784K sleep    0:00  0.00% yaksh
15130 ae106lo    1  40    0 1056K  896K sleep    0:00  0.00% sh
15070 ae106lo    1  50    0   36M 4040K sleep    0:00  0.00% g98

3)
F S      UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN    STIME
TTY      TIME CMD
8 S  ae106lo 16501 16498  0  61 20        ?    252        ? 09:37:22
pts/1    0:00 -ksh -p
8 S  ae106lo 15070 15066  0  49 20        ?   4554        ? 09:33:54 ?
       0:00 g98
8 O  ae106lo 18821 15131  2  99 20        ? 139273          09:44:29 ?
       0:08 /rwthfs/rz/SW/GAUSSIAN/ultra3/g98/l
8 S  ae106lo 14857 14722  0  77 20        ?    119        ? 09:33:53 ?
       0:00 -yaksh /w0/sge/sunc00/job_scripts/4
8 O  ae106lo 18823 15131  2  99 20        ? 139273          09:44:29 ?
       0:08 /rwthfs/rz/SW/GAUSSIAN/ultra3/g98/l
8 S  ae106lo 16498 16496  0  59 20        ?    119        ? 09:37:22
pts/1    0:00 -yaksh
8 S  ae106lo 14860 14857  0  99 20        ?    253        ? 09:33:53 ?
       0:00 -ksh -p /w0/sge/sunc00/job_scripts/
8 O  ae106lo 18822 15131  2  99 20        ? 139273          09:44:29 ?
       0:08 /rwthfs/rz/SW/GAUSSIAN/ultra3/g98/l
8 S  ae106lo 15130 15070  0  59 20        ?    132        ? 09:33:54 ?
       0:00 sh -c /rwthfs/rz/SW/GAUSSIAN/ultra3
8 S  ae106lo 15066 14860  0  99 20        ?    119        ? 09:33:54 ?
       0:00 timex g98
8 O  ae106lo 15131 15130  4  99 20        ? 139273          09:33:54 ?
      10:21 /rwthfs/rz/SW/GAUSSIAN/ultra3/g98/l

At the first view everything looks allright but the processes l502.exe
use 1088M over shared memory. Each process use these
memory but it is allocated once only.

   ------- Additional comments from andreas Wed Mar 3 06:50:52 -0700 2004 -------
Fix will not make it into 6.0.

Problem: It is unclear how the PDC component in sge_execd
had to be rewritten to allow shared memory segments be
counted only once for such jobs. Anyone got an idea?

   ------- Additional comments from andreas Thu Mar 4 03:46:57 -0700 2004 -------
Will not be done for 6.0 beta

   ------- Additional comments from andreas Mon Mar 29 04:52:09 -0700 2004 -------
Reopened.

   ------- Additional comments from andreas Mon Mar 29 05:56:19 -0700 2004 -------
Lowered priority.

   ------- Additional comments from pollinger Tue Dec 6 03:42:55 -0700 2005 -------
Reassigned to execution category

   ------- Additional comments from andreas Thu Feb 8 03:46:16 -0700 2007 -------
I have not digged into details, yet I believe it would be comparalby easy to
carve a small DTrace script for tracking all changes with shared memory segment
utilization on a per-node base. If such a monitor would exist, enhancing
sge_execd(1) in a way that  shared memory amount is covered in the "h_vmem" per
job resource usage could be done.

Anyone interested in investigating this please let me know.

Cheers,
Andreas

   ------- Additional comments from reuti Wed Mar 14 03:32:10 -0700 2007 -------
In some way this seems related to http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254 to
multiply the limits or not.

BTW: Gaussian is no longer using shared memory and forks but OpenMP for parallelization on one node
(for internode communication Linda is still necssary).

Change History (2)

comment:1 Changed 8 years ago by dlove

  • Severity set to minor

Probably a duplicate of #1301, but with specific info.

comment:2 Changed 6 years ago by dlove

Should be fixed on Linux (see #1301).

Note: See TracTickets for help on using tickets.