[GE users] Re: Re: [GE users] Issue seen in 6.2U4 : memory values reported by SGEtoo low compared to top output on linux systems

andy andreas.schwierskott at oracle.com
Fri Aug 13 17:15:20 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

I guess that's

   CR 6934090 wrong vmem reporting of qstat and qacct on Linux

(CRs - Change Requests are id's from Sun's bug database to which 
contract customers have access)

It's fixed in SGE 6.2u6. See here:

    http://gridengine.sunsource.net/project/gridengine/62patches.txt

I also thought it was introduced with u5 and not with u4.

Andy

> The bug still exists in 6.2u5. It was originally reported for u5.
>
> --
> Adam Tygart
> Beocat Sysadmin
>
> On Fri, Aug 13, 2010 at 09:58, ron<ron_chen_123 at yahoo.com>  wrote:
>> Do both SGE 6.2u5&  SGE6.2u6 have this problem?
>>
>> If SGE6.2u5 still has this bug, then I will see if I can get thix fixed.
>>
>>   -Ron
>>
>>
>> --- On Fri, 8/13/10, jenny<lulh at genomics.org.cn>  wrote:
>>
>> such as the following job, qstat info says its mem usage is 3.887G, but on the server, the top info says its mem usage is>100g.
>>
>>
>> # qstat -j 143871
>> ==============================================================
>> job_number:                 143871
>> exec_file:                  job_scripts/143871
>> submission_time:            Wed Aug 11 11:05:05 2010
>> hard resource_list:         virtual_free=400G
>> usage    1:                 cpu=8:20:05:15, mem=1940633.01964 GBs, io=1422.26572, vmem=3.887G, maxvmem=4.064G
>>
>>
>> Mem:  1055302704k total, 833846592k used, 221456112k free,   110800k buffers
>> Swap: 104864276k total,    21452k used, 104842824k free, 514290156k cached
>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 16441 b  15   0  103g 101g  364 S 1571.4 10.1   2172:50 grape63mer
>>
>>
>>
>>
>>
>>
>> 2010-08-13
>> ________________________________
>> ???  Jenny_Lu
>> ????
>> ?????????
>> lulh at genomics.org.cn
>> Tel:075525273811
>> Mobile:15986782583  62583
>> ________________________________
>> ???? rayson
>> ????? 2010-08-13  04:20:06
>> ???? users
>> ???
>> ??? Re: [GE users] Issue seen in 6.2U4 : memory values reported by SGEtoo low compared to top output on linux systems
>> Hi Jenny,
>> What is the online job usage?? You can get that from the output of
>> "qstat -j<jod>" while the job is running.
>> Rayson
>> On Thu, Aug 12, 2010 at 7:21 AM, reuti<reuti at staff.uni-marburg.de>  wrote:
>>>   Hi,
>>>
>>>   Am 12.08.2010 um 04:27 schrieb jenny:
>>>
>>>>   I can confirm that both vmem and maxvmem values shown by qstat (and qacct) are modulo 2^32 in bytes, at least on lx24-amd64. Here is the output from qacct for a simple C program that calloc()-s some good deal of memory of various sizes:
>>>>
>>>>   calloc 3 GiB:
>>>>   cpu          3.540
>>>>   mem          7.374
>>>>   maxvmem      3.013G
>>>>
>>>>   calloc 7 GiB:
>>>>   cpu          8.120
>>>>   mem          22.271
>>>>   maxvmem      3.013G
>>>>
>>>>   calloc 11 GiB:
>>>>   cpu          12.710
>>>>   mem          34.406
>>>>   maxvmem      3.013G
>>>>
>>>>   calloc 4 GiB:
>>>>   cpu          4.730
>>>>   mem          0.012
>>>>   maxvmem      13.074M
>>>>
>>>>   It looks like a bug in SGE to me - vmem's value is converted to 32-bit somewhere along the path (probably as early as in the shepherd). That results in incorrect value for the time integral in "mem" as well.
>>>>
>>>>   Does anybody met the same problem?
>>>   is this a copy/paste of this post?
>>>
>>>   http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247921
>>>
>>>   -- Reuti
>>>
>>>
>>>
>>>>   2010-08-11
>>>>   ???  Jenny_Lu
>>>>   ????
>>>>   ?????????
>>>>   lulh at genomics.org.cn
>>>>   Tel:075525273811
>>>>   Mobile:15986782583  62583
>>>   ------------------------------------------------------
>>>   http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=273960
>>>
>>>   To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274080
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>> __________ Information from ESET NOD32 Antivirus, version of virus signature database 5361 (20100812) __________
>> The message was checked by ESET NOD32 Antivirus.
>> http://www.eset.com
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274304
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274320

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list