[GE users] execd behaviour in case of qmaster crash

rayson rayrayson at gmail.com
Thu Jun 11 16:49:03 BST 2009


On 6/11/09, ah_sunsource <ahaupt at ifh.de> wrote:
> Hi again,
>
> I could reproduce the behaviour. With some jobs the qmaster consumes
> really *huge* amounts of memory and crashes the system. That's the last
> thing I saw in "top".

That sounds like a memory leak -- check the issue database and see if
there is already a problem reported.

Rayson



>
> top - 16:21:04 up 29 min,  1 user,  load average: 1.35, 0.34, 0.11
> Tasks:  83 total,   2 running,  81 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.3%us, 46.3%sy,  0.0%ni,  8.7%id, 44.6%wa,  0.0%hi,  0.0%si,  0.1%st
> Mem:   4194304k total,  4185452k used,     8852k free,      260k buffers
> Swap:  1052216k total,  1052216k used,        0k free,     1468k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2808 sge       15   0 5103m 3.8g  548 S 41.6 96.1   0:44.50 sge_qmaster
>
> Is there a way to limit the memory consumption of the qmaster process
> somehow? Or is there a recommendation how much memory a master host
> should have installed to avoid swapping?
>
> Cheers,
> Andreas
>
> On Thu, 2009-06-11 at 09:57 +0200, ah_sunsource wrote:
> > Hi Rayson,
> >
> > On Thu, 2009-06-11 at 02:10 -0500, rayson wrote:
> > > Hi,
> > >
> > > So what is the exact state of the master when this happens?? Is the
> > > machine up but the qmaster process dead??
> >
> > The host is still up. But I cannot login any more. The qmaster process
> > still seems to run  - also the tcp socket is still reachable:
> >
> > [hpbl1] ~ # telnet lolek-vm1 sge_qmaster
> > Trying 141.34.32.95...
> > Connected to lolek-vm1.
> > Escape character is '^]'.
> > ^]
> > telnet> quit
> > [hpbl1] ~ # getent services sge_qmaster
> > sge_qmaster           538/tcp
> > [hpbl1] ~ # qping -info lolek-vm1 538 qmaster 1
> > endpoint lolek-vm1.ifh.de/qmaster/1 at port 538: can't find connection
> >
> > Any communication simply hangs.
> >
> > Cheers,
> > Andreas
>
> --
> | Andreas Haupt             | E-Mail: andreas.haupt at desy.de
> |  DESY Zeuthen             | WWW:    http://www-zeuthen.desy.de/~ahaupt
> |  Platanenallee 6          | Phone:  +49/33762/7-7359
> |  D-15738 Zeuthen          | Fax:    +49/33762/7-7216
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201552
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201558

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list