[GE users] qmaster SEGVs

mhanby mhanby at uab.edu
Tue May 4 15:17:48 BST 2010


I haven't found any solution. My SEGV happened in 6.2u4 and after upgrading to 6.2u5 continued.

For me, it seems to always happen following a reboot. After several crashes, it seems to stabilize for a while (days, weeks) before it starts again.

My workaround is to use Nagios and event handlers to start it back up if it isn't running.

-----Original Message-----
From: heywood [mailto:heywood at cshl.edu] 
Sent: Monday, May 03, 2010 12:51 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] qmaster SEGVs

We rebooted the node running qmaster, and we are now also getting qmaster
crashes. I see in the archive there is another thread "sgemaster keeps
crashing 6.2u4" from February which apparently is the same issue. After a
number of crashes I got qmaster to keep running (for now!).

We are running 6.2u5 with RHEL4.

I guess there is no solution/resolution?

Todd


sge_qmaster[5851]: segfault at 0000000000000080 rip 00000039fa470560 rsp
000000004780aa38 error 4
sge_qmaster[6163]: segfault at 0000000000000080 rip 00000039fa470560 rsp
000000004780aa38 error 4
sge_qmaster[6573]: segfault at 0000000000000000 rip 00000000005bf6c7 rsp
0000000047809ec0 error 4

On 3/17/10 12:14 PM, "abrookfield" <a.brookfield at sheffield.ac.uk> wrote:

> I'm also having problems with qmaster SEGVs in 6.2u5, running on RHEL5,
> x86_64. 
> 
> Crashes seem to be correlated with users deleting jobs, particularly (but not
> exclusively) OpenMPI parallel jobs which have been running for 'a while'.
> Other than updating to u5 we've not made any config changes to our setup.
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=249
> 186
> 
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255955

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256103

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list