[GE users] Qmaster hangs frequently (v6.2_u2)

crei crei at sun.com
Thu Jun 25 09:44:56 BST 2009


Hi,


On 06/25/09 00:13, parimi wrote:
> Qping output is convoluted, I think this is a known issue in v6.2,
> there's a thread and bug report about this.

Yes - the qping output is a bit convoluted - I just wanted to be
sure that every thread is reporting its data which wasn't the
case for the worker000, worker001 nad listener 000. Perhaps
you can wait until all threads report data.


06/11/2009 19:09:55 | worker000: no monitoring data available
06/11/2009 19:09:55 | worker001: no monitoring data available
06/11/2009 19:09:55 | listener000: no monitoring data available
06/11/2009 19:12:59 | listener001: runs: 7.32r/s (in (g:7.28 a:0.00

> 
> Monitoring time configured as below:
> 
> $ qconf -sconf | grep -A1 qmaster_params
> qmaster_params               ENABLE_FORCED_QDEL=true
> MONITOR_TIME=00:00:30 \
>                              LOG_MONITOR_MESSAGE=false MAX_DYN_EC=200
> $
> 
> We have 8 core box with 32G RAM, resources on the qmaster isn't a
> problem. Qmaster was busy doing something but not loaded at all when it
> becomes unresponsive to any client requests.
> 
> Any ideas why cleaning few jobs from job spool helps recovering the
> qmaster? Qmaster restart or failover to a shadow master doesn't help
> though.

My assumption is that some spooled job might have had a reverence to
a not existing or removed object like queue, host, etc.

When restarting the qmaster the old jobs are read from the spooling
database/directory and this might have caused a deadlock or something
else.

It would be nice if you could reproduce the problem and then do another
qping or startup the qmaster in debug mode ...



> 
> Thanks, Parimi V.
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=203403
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

-- 
Sun Microsystems GmbH             Christian Reissmann
Dr.-Leo-Ritter-Str. 7             Software Engineer
D-93049 Regensburg                Phone: +49 (0)941 3075 112
Germany                           Fax:   +49 (0)941 3075 222
http://www.sun.de                 mailto: Christian.Reissmann at sun.com
                                   http://www.sun.com/gridengine
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=203507

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list