[GE users] Issue seen in 6.2U5 : memory values reported by SGE too low compared to top output on linux systems

icaci hristo at mc.phys.uni-sofia.bg
Thu Mar 11 00:17:55 GMT 2010


    [ The following text is in the "Windows-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

I can confirm that both vmem and maxvmem values shown by qstat (and qacct) are modulo 2^32 in bytes, at least on lx24-amd64. Here is the output from qacct for a simple C program that calloc()-s some good deal of memory of various sizes:

calloc 3 GiB:
cpu          3.540
mem          7.374
maxvmem      3.013G

calloc 7 GiB:
cpu          8.120
mem          22.271
maxvmem      3.013G

calloc 11 GiB:
cpu          12.710
mem          34.406
maxvmem      3.013G

calloc 4 GiB:
cpu          4.730
mem          0.012
maxvmem      13.074M

It looks like a bug in SGE to me - vmem's value is converted to 32-bit somewhere along the path (probably as early as in the shepherd). That results in incorrect value for the time integral in "mem" as well.

Hristo

On 10.03.2010, at 20:10, shruti_m wrote:

Hi Reuti/Stephen,

Actual data is being written to memory. Both VIRT and RES values are quite high compared to vmem reported by SGE. It is easily reproducible and consistent in behavior.

=======================================================================================================

top - 10:08:48 up 104 days, 13:52, 11 users,  load average: 1.00, 0.69, 0.30
Tasks: 124 total,   2 running, 122 sleeping,   0 stopped,   0 zombie
Cpu(s): 24.4% us,  0.7% sy,  0.0% ni, 74.8% id,  0.1% wa,  0.0% hi,  0.1% si
Mem:  16409180k total, 12688836k used,  3720344k free,    69944k buffers
Swap: 33557960k total,   127132k used, 33430828k free,   427984k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14517 shruti    25   0 11.7g  11g  23m R 99.9 73.2   5:13.35 common_shell_ex

=======================================================================================================
usage    1:                 cpu=00:04:44, mem=512.92204 GBs, io=0.00000, vmem=2.681G, maxvmem=3.997G
=======================================================================================================

Thanks,
Shruti


-----Original Message-----
From: stephendennis [mailto:sdennis at univaud.com]
Sent: Wednesday, March 10, 2010 6:21 AM
To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>
Subject: RE: [GE users] Issue seen in 6.2U5 : memory values reported by SGE too low compared to top output on linux systems

Hello Shruti

When you allocated the memory did you also write data into it?
Until you do the allocation is virtual.

Stephen
________________________________________
From: shruti_m [shruti at synopsys.com]
Sent: Tuesday, March 09, 2010 6:37 PM
To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>
Subject: [GE users] Issue seen in 6.2U5 : memory values reported by SGE too low compared to top output on linux systems

Hi All,

We recently upgraded one of the sites to 6.2U5. Since the upgrade, we have noticed that vmem and maxvmem values reported by qstat in SGE is much low compared to real time mem consumed by the job and reflected in top output of the system.

e.g I submit a job to grab 20G memory and hold it for 300 sec. In top output, I do see my job consuming upto 20G memory for 300 sec?qstat output shows maxvmem to have never exceeded 3G !! It is easibly reproducible on lx24-amd64 systems.

Let me know, if anybody else has seen similar behavior.

Thanks,
Shruti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247841

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

--
Dr Hristo Iliev
Monte Carlo research group
Faculty of Physics, University of Sofia
5 James Bourchier blvd, 1164 Sofia, Bulgaria
http://cluster.phys.uni-sofia.bg/hristo/




More information about the gridengine-users mailing list