Opened 51 years ago
Last modified 10 years ago
#878 new task
IZ524: wrong setup of jmx server in qmaster results in dropping of jgdi connection
Reported by: | zwierzak | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | hedeby | Version: | 1.0 |
Severity: | Keywords: | Sun gridengine_adapter | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=524]
Issue #: 524 Platform: Sun Reporter: zwierzak (zwierzak) Component: hedeby OS: All Subcomponent: gridengine_adapter Version: 1.0 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: TASK Target milestone: 1.0u5next Assigned to: rhierlmeier (rhierlmeier) QA Contact: rhierlmeier URL: * Summary: wrong setup of jmx server in qmaster results in dropping of jgdi connection Status whiteboard: Attachments: Issue 524 blocks: Votes for issue 524: Vote for this issue Opened: Thu Jul 24 04:50:00 -0700 2008 ------------------------ wrong setup of jmx server in qmaster results in dropping of jgdi connection, > If you see such log ...... > >> 07/23/2008 19:35:03|15|I|The resource provider has been stopped >> 07/23/2008 19:35:07|21|I|The spare pool has been stopped. >> 07/23/2008 19:35:07|18|I|Shutdown finished >> 07/23/2008 19:35:24|10|I|startup jvm (pid=20414) >> 07/23/2008 19:35:26|11|I|Secure mbean server started (service:jmx:rmi:///jndi/rmi://foo.bar:48309/system) >> 07/23/2008 19:35:26|12|I|The spare pool has been started. >> 07/23/2008 19:35:26|13|I|The reporter has been started. >> 07/23/2008 19:35:26|15|I|Service service: Starting Grid Engine service >> 07/23/2008 19:35:27|15|W|Service service: Connection to qmaster has been lost >> 07/23/2008 19:35:27|15|I|Service service: qmaster not running, try reconnect every 60 seconds >> 07/23/2008 19:35:28|16|I|The resource provider has been started > > > the state of "service" component is started, "service" service is unknown > > the reason can be that during installation of GE (JMX step) you dont specify the password for server if you set too short passwd the problem is the same There should be some error reported somehow by geadapter that jmx server in qmaster may have some problems ------- Additional comments from zwierzak Tue Aug 5 03:52:20 -0700 2008 ------- Description: Logs of GE-Adapter are misleading when jmx thread in qmaster is wrongly configured and it does not run. > If you see such log ...... > >> 07/23/2008 19:35:03|15|I|The resource provider has been stopped >> 07/23/2008 19:35:07|21|I|The spare pool has been stopped. >> 07/23/2008 19:35:07|18|I|Shutdown finished >> 07/23/2008 19:35:24|10|I|startup jvm (pid=20414) >> 07/23/2008 19:35:26|11|I|Secure mbean server started (service:jmx:rmi:///jndi/rmi://foo.bar:48309/system) >> 07/23/2008 19:35:26|12|I|The spare pool has been started. >> 07/23/2008 19:35:26|13|I|The reporter has been started. >> 07/23/2008 19:35:26|15|I|Service service: Starting Grid Engine service >> 07/23/2008 19:35:27|15|W|Service service: Connection to qmaster has been lost >> 07/23/2008 19:35:27|15|I|Service service: qmaster not running, try reconnect every 60 seconds >> 07/23/2008 19:35:28|16|I|The resource provider has been started > > > the state of "service" component is started, "service" service is unknown From the log track user/administrator would assume that qmaster was running but it crashed or dropped connection, it misleading because jmx thread in qmaster was never up. When jmx thread in qmaster is wrongly configured (no password for jmx server, password too short) Evaluation: Hedeby has no problems it's just about improving log entry. Suggested Fix / Work Around: Improve log entry. User needs to go to qmaster machine and check logs if jmx thread is running to find out what is going on. Analysis: With this task GEAdapterImpl.java log should be improved to avoid confusion. Should we contact GE guys and request that starting of qmaster with wrongly configured jmx thread (that is not starting at all) will print error message to user?? (file rfe??) Currently just error to qmaster logs is written and qmaster itself is running without jmx thread. User is unaware that something went wrong. How to test: Try to connect to qmaster when qmaster is not running or jmx thread in qmaster is not running. Check if the logs are proper ones. ETC: 2 PD ATC: 0.5 PD ------- Additional comments from torsten Tue Oct 28 03:30:18 -0700 2008 ------- I stumbled over the same problem (insufficient information in log file) when there are connection problems from GE adapter to qmaster. In my case this was because of wrong certificates in the GE adapter configuration. Please find below the description and analysis I did. The proposed fix has an ETC of 0.5PD. Unhelpful error message when connection from GE adapter to qmaster is lost Description: When the connection from GE adapter to qmaster is lost, the following line appears in the respective VM-log: 10/28/2008 08:43:51|12|W|Service geadapter: Connection to qmaster has been lost Analysis: A GrmException that is thrown from GEConnection.connect() is caught and logged logged from line 784 in GEServiceImpl.java (message "gsi.lost"). The GrmException contains valuable information about the cause of the connection loss and is even handed into the logging message as a parameter. BUT this parameter is never used in the gsi.lost message. => Add this parameter in the messages.properties file! gsi.lost = Service {0}: Connection to qmaster has been lost. Cause: {1} ------- Additional comments from rhierlmeier Wed Nov 25 07:21:10 -0700 2009 ------- Milestone changed
Note: See
TracTickets for help on using
tickets.