[GE users] SGE 6.2-u2_1 qmaster segfault

crei crei at sun.com
Tue Jun 30 09:50:32 BST 2009


Hi,

The qping output is currently broken, see
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2767

Perhaps this mail-thread helps you:

http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=203098

Regards,

Christian


On 06/29/09 19:35, m0zes wrote:
> Hello Everyone,
> 
> I am wondering if this issue is something unique to my installation,
> or if someone else has seen something similar.
> 
> I am running SGE 6.2-u2_1 on Gentoo Linux hosts using kernel
> 2.6.27-r6. I am regularly seeing sge_qmaster segfault. I have been
> trying to chase down this issue, but haven't had much luck.
> 
> It seems like qmaster segfaults when some jobs exit, almost as if
> sge_shepherd is sending back bad responses. I have turned on
> debugging, and haven't seen anything out of the ordinary, simply a
> segfault at the end. I have tried strace'ing the application, again
> just getting a segfault at the end.
> 
> Today I have come across the qping application, and I am getting a
> "warning" output:
> 
> mozes at athena ~ $ qping -info athena.beocat 1000 qmaster 1
> 06/29/2009 12:29:39:
> SIRM version:             0.1
> SIRM message id:          1
> start time:               06/29/2009 12:05:48 (1246295148)
> run time [s]:             1431
> messages in read buffer:  0
> messages in write buffer: 0
> nr. of connected clients: 121
> status:                   1
> info:                     MAIN: E (1430.58) | signaler000: E (1425.00)
> | event_master000: E (0.03) | timer000: E (0.03) | worker000: E (0.14)
> | worker001: E (1.38) | listener000: E (0.23) | listener001: E (0.87)
> | scheduler000: E (5.03) | WARNING
> malloc:                   arena(0) |ordblks(1) | smblks(0) | hblksr(0)
> | hblhkd(0) usmblks(0) | fsmblks(0) | uordblks(0) | fordblks(0) |
> keepcost(0)
> Monitor:                  disabled
> 
> From that output, it looks like the main thread hasn't responded since
> startup, and the signaler isn't much better.
> 
> Throughout all of this the jobs continue to run and sge_shadowd
> restarts the segfaulted qmaster.
> 
> Is there anything I can try?
> 
> --
> Adam Tygart
> Beocat Sysadmin
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=204385
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

-- 
Sun Microsystems GmbH             Christian Reissmann
Dr.-Leo-Ritter-Str. 7             Software Engineer
D-93049 Regensburg                Phone: +49 (0)941 3075 112
Germany                           Fax:   +49 (0)941 3075 222
http://www.sun.de                 mailto: Christian.Reissmann at sun.com
                                   http://www.sun.com/gridengine
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=204551

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list