[GE users] Adding subordinate que using qmon crashes qmaster

Marconnet, James E Mr /Computer Sciences Corporation james.marconnet at smdc.army.mil
Tue Mar 22 22:39:38 GMT 2005


The simple answer to this was a known bug in 6.0u1: 

 <http://gridengine.sunsource.net/issues/show_bug.cgi?id=1289>
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1289

Thanks to all.

Jim Marconnet


  _____  

From: Marconnet, James E Mr /Computer Sciences Corporation
[mailto:james.marconnet at smdc.army.mil] 
Sent: Tuesday, March 22, 2005 2:25 PM
To: users at gridengine.sunsource.net
Subject: [GE users] Adding subordinate que using qmon crashes qmaster




We've been running 6.0u1 OK for some time now. Everyone tells me nothing was
changed Friday. Friday afternoon I started adding subordinate ques using
qmon to try to prevent oversubscribing nodes when jobs were submitted by
different groups to two different ques that include instances of the same
nodes. After I modified several ques without incident, It started
intermittantly crashing qmaster when I click OK to save the changes within
qmon. So far I'm not aware of anything else that I can do to crash qmaster
using qmon. Our IT guys looked at some system logs and saw a message that
suggested it was a memory problem on that particular node. They did all sort
of (so far fruitless) hardware swapping, software reloading, etc. and have
gravitated to "Jim, Please don't try to add subordinate ques using qmon!"

The message when qmaster crashes is: ' Unable to contact qmaster using port
536 on host "stingray_2_2_2" ' When saving from within qmon, we alternately
occasionally get a message about failed GDI, which the IT folks said had
something to do with two different clocks out of synchronization after
restarting qmaster, and drifting towards or away from each other. But
sometimes I can add a subordinate que or remove one and it works just fine.
But when it does crash qmaster, it's bad, since I have to call IT support to
restart it.

In moving backwards and forwards thru SGE software and settings, they
mentioned that an earlier version of settings that was made from the command
line without using the qmon GUI works, but it does not display anything when
brought up using qmon. That baffles me completely. Seems like whether you
make changes from the command line or qmon ought to make no difference in
the resulting underlying file(s) and whether qmon can see and edit them.
Perhaps we misundersand something there. Just mentioned it here in case it
helps clear things up.

Anyway, they are thinking of upgrading to 6.0u3 on a different node and
hoping things would be better. They are also thinking of having qmaster
running on two (or more!) different nodes at once. Not sure quite what that
would accomplish for us except reducing our available production nodes by
one.

Anything we should look for or try  besides "Just say NO to subordinated
ques!" ? 

Thanks! 
Jim Marconnet 




More information about the gridengine-users mailing list