[GE users] Re: Re: [GE users] Issue seen in 6.2U4 : memory values reported by SGEtoo low compared to top output on linux systems

jenny lulh at genomics.org.cn
Fri Aug 13 07:06:04 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

such as the following job, qstat info says its mem usage is 3.887G, but on the server, the top info says its mem usage is >100g.


# qstat -j 143871
==============================================================
job_number:                 143871
exec_file:                  job_scripts/143871
submission_time:            Wed Aug 11 11:05:05 2010
hard resource_list:         virtual_free=400G
usage    1:                 cpu=8:20:05:15, mem=1940633.01964 GBs, io=1422.26572, vmem=3.887G, maxvmem=4.064G


Mem:  1055302704k total, 833846592k used, 221456112k free,   110800k buffers
Swap: 104864276k total,    21452k used, 104842824k free, 514290156k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16441 b  15   0  103g 101g  364 S 1571.4 10.1   2172:50 grape63mer






2010-08-13
________________________________
???  Jenny_Lu
????
?????????
lulh at genomics.org.cn<mailto:lulh at genomics.org.cn>
Tel:075525273811
Mobile:15986782583  62583
________________________________
???? rayson
????? 2010-08-13  04:20:06
???? users
???
??? Re: [GE users] Issue seen in 6.2U4 : memory values reported by SGEtoo low compared to top output on linux systems
Hi Jenny,
What is the online job usage?? You can get that from the output of
"qstat -j <jod>" while the job is running.
Rayson
On Thu, Aug 12, 2010 at 7:21 AM, reuti <reuti at staff.uni-marburg.de> wrote:
> Hi,
>
> Am 12.08.2010 um 04:27 schrieb jenny:
>
>> I can confirm that both vmem and maxvmem values shown by qstat (and qacct) are modulo 2^32 in bytes, at least on lx24-amd64. Here is the output from qacct for a simple C program that calloc()-s some good deal of memory of various sizes:
>>
>> calloc 3 GiB:
>> cpu          3.540
>> mem          7.374
>> maxvmem      3.013G
>>
>> calloc 7 GiB:
>> cpu          8.120
>> mem          22.271
>> maxvmem      3.013G
>>
>> calloc 11 GiB:
>> cpu          12.710
>> mem          34.406
>> maxvmem      3.013G
>>
>> calloc 4 GiB:
>> cpu          4.730
>> mem          0.012
>> maxvmem      13.074M
>>
>> It looks like a bug in SGE to me - vmem's value is converted to 32-bit somewhere along the path (probably as early as in the shepherd). That results in incorrect value for the time integral in "mem" as well.
>>
>> Does anybody met the same problem?
>
> is this a copy/paste of this post?
>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247921
>
> -- Reuti
>
>
>
>>
>> 2010-08-11
>> ???  Jenny_Lu
>> ????
>> ?????????
>> lulh at genomics.org.cn
>> Tel:075525273811
>> Mobile:15986782583  62583
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=273960
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274080
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
__________ Information from ESET NOD32 Antivirus, version of virus signature database 5361 (20100812) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com



More information about the gridengine-users mailing list