[GE users] Issue seen in 6.2U4 : memory values reported by SGE too low compared to top output on linux systems

rayson rayrayson at gmail.com
Fri Aug 13 04:21:52 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

We really need to see if qstat also has this bug.

If qstat online job usage is also not correct, then we don't need to
trace the shepherd, the qmaster's accounting file I/O, and qacct.

Rayson



On Thu, Aug 12, 2010 at 4:23 PM, hawson <beckerjes at mail.nih.gov> wrote:
> On Thu, Aug 12, 2010 at 04:19:26PM -0400, rayson wrote:
>>Hi Jenny,
>>
>>What is the online job usage?? You can get that from the output of
>>"qstat -j <jod>" while the job is running.
>
> Just to chime in on this:  we've seen memory issues using 6.2u5 as well.
> I don't have any hard numbers handy, but the value reported by qstat
> and qacct are roughly half the amount reported via 'top', 'ps', and
> similar tools.
>
> This is on an lx26-amd64 box, custom-compiled version of SGE.
>
>>
>>Rayson
>>
>>
>>
>>On Thu, Aug 12, 2010 at 7:21 AM, reuti <reuti at staff.uni-marburg.de> wrote:
>>> Hi,
>>>
>>> Am 12.08.2010 um 04:27 schrieb jenny:
>>>
>>>> I can confirm that both vmem and maxvmem values shown by qstat (and qacct) are modulo 2^32 in bytes, at least on lx24-amd64. Here is the output from qacct for a simple C program that calloc()-s some good deal of memory of various sizes:
>>>>
>>>> calloc 3 GiB:
>>>> cpu          3.540
>>>> mem          7.374
>>>> maxvmem      3.013G
>>>>
>>>> calloc 7 GiB:
>>>> cpu          8.120
>>>> mem          22.271
>>>> maxvmem      3.013G
>>>>
>>>> calloc 11 GiB:
>>>> cpu          12.710
>>>> mem          34.406
>>>> maxvmem      3.013G
>>>>
>>>> calloc 4 GiB:
>>>> cpu          4.730
>>>> mem          0.012
>>>> maxvmem      13.074M
>>>>
>>>> It looks like a bug in SGE to me - vmem's value is converted to 32-bit somewhere along the path (probably as early as in the shepherd). That results in incorrect value for the time integral in "mem" as well.
>>>>
>>>> Does anybody met the same problem?
>>>
>>> is this a copy/paste of this post?
>>>
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247921
>>>
>>> -- Reuti
>>>
>>>
>>>
>>>>
>>>> 2010-08-11
>>>> ?????????  Jenny_Lu
>>>> ????????????
>>>> ???????????????????????????
>>>> lulh at genomics.org.cn
>>>> Tel:075525273811
>>>> Mobile:15986782583  62583
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=273960
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>
>>
>>------------------------------------------------------
>>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274080
>>
>>To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> --
> Jesse Becker
> NHGRI Linux support (Digicon Contractor)
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274081
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274171

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list