[GE users] Scheduler died unexpectedly

Mulley, Nikhil Nikhil.Mulley at deshaw.com
Thu Jan 24 00:14:38 GMT 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

[probably the question is wrong or its entirely wrong to expect to have schedd dump a core? I see many people in the past have asked for something on the core file, but sadly I see that there is no response to them. Is it going to be the same here. Please no.]

[subject of the discussion, could well be turned to for a plea of making the v6.1u4 build binaries atleast for solaris, be please made available]

So, it turned out to be a problem with a memory leak in the scheduler as I am able to see the scheduler dying atleast once in a day. Is not it? I am using v6.1u3 BTW. And this problem is documented in 
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2187 seems to be fixed or the fix is available in V61_BRANCH for v6.1u4.

Thanks Andreas for the changes.

I see that v6.1u4 binaries are not available to the public yet, are there any plans to make them public anytime soon? 

Andreas, can I please request you to provide the binaries of v6.1u4, atleast for the solaris-amd64 and solaris-x86 architectures?

I shall be very happy to test it in my environment and desperately want to avoid the nightmares of restarting the scheduler in v6.1u3 everytime it happens.

Thanks,
Nikhil

-----Original Message-----
From: Mulley, Nikhil 
Sent: Sunday, January 13, 2008 4:27 PM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] Scheduler died unexpectedly

Can I ask schedd dump core when it dies next time? That would perhaps allow me to do generate some post-mortem report. 

-----Original Message-----
From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM] 
Sent: Friday, January 11, 2008 6:10 PM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] Scheduler died unexpectedly

Hi Nikhil,

best way is to run schedd under control of dbx/gdb. That way you don't 
need to care about a core dump for getting the stack trace.

Note, you must have SGE_ND in environment as to prevent schedd daemonizing.

Regards,
Andreas

On Thu, 10 Jan 2008, Mulley, Nikhil wrote:

> Is there means of enabling the scheduler debugging ?
>
> -----Original Message-----
> From: Mulley, Nikhil
> Sent: Thursday, January 10, 2008 1:46 PM
> To: users at gridengine.sunsource.net
> Subject: [GE users] Scheduler died unexpectedly
>
> I want to look at why and how the scheduler died. I am using SGE
> v6.0.11. Any (forensic) reports could be generated that why the
> scheduler could have died in first place?
>
> First thing that I came to notice that scheduler is died as the
> schedd.pid was referring to non-existing pid number on my qmaster host
> (from the act_qmaster file), I was wondering why is that shadowd did not
> notice this and did not start the schedd/qmaster on one of the shadow
> masters ? Is this mechanism can be expected from the host running
> shadowd?
>
> Thanks,
> Nikhil
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

<°)))><

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list