[GE users] Re: Re: [GE users] Issue seen in 6.2U4 : memory values reported by SGEtoo low compared to top output on linux systems

rayson rayrayson at gmail.com
Fri Aug 13 21:35:20 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

I think I have a fix, and the bug seems to be introduced in SGE 6.2u4
-- I have not tested with the u4 binary, however.

Rayson



On Fri, Aug 13, 2010 at 12:15 PM, andy <andreas.schwierskott at oracle.com> wrote:
>
> I also thought it was introduced with u5 and not with u4.
>
> Andy
>
>> The bug still exists in 6.2u5. It was originally reported for u5.
>>
>> --
>> Adam Tygart
>> Beocat Sysadmin
>>
>> On Fri, Aug 13, 2010 at 09:58, ron<ron_chen_123 at yahoo.com>  wrote:
>>> Do both SGE 6.2u5&  SGE6.2u6 have this problem?
>>>
>>> If SGE6.2u5 still has this bug, then I will see if I can get thix fixed.
>>>
>>>   -Ron
>>>
>>>
>>> --- On Fri, 8/13/10, jenny<lulh at genomics.org.cn>  wrote:
>>>
>>> such as the following job, qstat info says its mem usage is 3.887G, but on the server, the top info says its mem usage is>100g.
>>>
>>>
>>> # qstat -j 143871
>>> ==============================================================
>>> job_number:                 143871
>>> exec_file:                  job_scripts/143871
>>> submission_time:            Wed Aug 11 11:05:05 2010
>>> hard resource_list:         virtual_free=400G
>>> usage    1:                 cpu=8:20:05:15, mem=1940633.01964 GBs, io=1422.26572, vmem=3.887G, maxvmem=4.064G
>>>
>>>
>>> Mem:  1055302704k total, 833846592k used, 221456112k free,   110800k buffers
>>> Swap: 104864276k total,    21452k used, 104842824k free, 514290156k cached
>>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 16441 b  15   0  103g 101g  364 S 1571.4 10.1   2172:50 grape63mer
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2010-08-13
>>> ________________________________
>>> ???  Jenny_Lu
>>> ????
>>> ?????????
>>> lulh at genomics.org.cn
>>> Tel:075525273811
>>> Mobile:15986782583  62583
>>> ________________________________
>>> ???? rayson
>>> ????? 2010-08-13  04:20:06
>>> ???? users
>>> ???
>>> ??? Re: [GE users] Issue seen in 6.2U4 : memory values reported by SGEtoo low compared to top output on linux systems
>>> Hi Jenny,
>>> What is the online job usage?? You can get that from the output of
>>> "qstat -j<jod>" while the job is running.
>>> Rayson
>>> On Thu, Aug 12, 2010 at 7:21 AM, reuti<reuti at staff.uni-marburg.de>  wrote:
>>>>   Hi,
>>>>
>>>>   Am 12.08.2010 um 04:27 schrieb jenny:
>>>>
>>>>>   I can confirm that both vmem and maxvmem values shown by qstat (and qacct) are modulo 2^32 in bytes, at least on lx24-amd64. Here is the output from qacct for a simple C program that calloc()-s some good deal of memory of various sizes:
>>>>>
>>>>>   calloc 3 GiB:
>>>>>   cpu          3.540
>>>>>   mem          7.374
>>>>>   maxvmem      3.013G
>>>>>
>>>>>   calloc 7 GiB:
>>>>>   cpu          8.120
>>>>>   mem          22.271
>>>>>   maxvmem      3.013G
>>>>>
>>>>>   calloc 11 GiB:
>>>>>   cpu          12.710
>>>>>   mem          34.406
>>>>>   maxvmem      3.013G
>>>>>
>>>>>   calloc 4 GiB:
>>>>>   cpu          4.730
>>>>>   mem          0.012
>>>>>   maxvmem      13.074M
>>>>>
>>>>>   It looks like a bug in SGE to me - vmem's value is converted to 32-bit somewhere along the path (probably as early as in the shepherd). That results in incorrect value for the time integral in "mem" as well.
>>>>>
>>>>>   Does anybody met the same problem?
>>>>   is this a copy/paste of this post?
>>>>
>>>>   http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247921
>>>>
>>>>   -- Reuti
>>>>
>>>>
>>>>
>>>>>   2010-08-11
>>>>>   ???  Jenny_Lu
>>>>>   ????
>>>>>   ?????????
>>>>>   lulh at genomics.org.cn
>>>>>   Tel:075525273811
>>>>>   Mobile:15986782583  62583
>>>>   ------------------------------------------------------
>>>>   http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=273960
>>>>
>>>>   To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274080
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>> __________ Information from ESET NOD32 Antivirus, version of virus signature database 5361 (20100812) __________
>>> The message was checked by ESET NOD32 Antivirus.
>>> http://www.eset.com
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274304
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274320
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274355

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list