[GE users] possible qstat problem with 6.0u7?
stark at tuebingen.mpg.de
Thu Dec 22 13:26:19 GMT 2005
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
On Tuesday 20 December 2005 09:45, Marco Donauer - SUN Microsystems wrote:
> I talked to our communication guru, and this is no problem. This message
> appears, if a client
> is stopped with control c or a kill.
Yes, it sometimes happens if people are doing "qlogin -now no" and are tired
of waiting for a slot after an hour or so. They hit ctrl-c and the qlogin
process goes away. Unfortunately the job does not disappear from the qstat
listing. But that's another problem.
> >>Currently I'm not able to reproduce the both errors. The qstat -F .....
> >>-q .... is working
The "qstat -F ... -q all.q at node1" problem is a bit clearer now:
If I add the node1 to the /etc/hosts file it works. It does not work if the
only way to resolve the hosts name is DNS however. Most notably this means it
does not necessarily have to do with the u4->u7 transition because I also
switched from /etc/hosts to DNS after the upgrade.
> >>and the qstat -j is working too.
Problem with qstat -j still exists, even if I add all hosts to /etc/hosts. So
those problems do not seem to be related to each other.
> Hm I don't know. I don't thinks that a faulty memory is the reason.
> You're talking about a high load.
> Is this load on the nfs also? In this caes the connection to the master
> host could be lost.
The nfs load is high sometimes, yes. For this reason we use a Solaris
fileserver :) I never noticed nfs connection problems, if there were any, I'm
sure the users would complain (as their homes would stop working)
> One other question, did you do an upgrade from u4 to u7 or is this a
> complet new installation with u7?
I upgraded exactly like the upgrade manual said. I also did the backup and bdb
> In case of an upgrade, are you really sure, that all binaries and libs
> are upgraded eg. local binaries or something else?
All binaries and shared libraries in the lib/, bin/ and utilbin/ directories
carry the date "Dec 9 13:41", so they were updated for sure. I also checked
the uptimes of all nodes, there's no way they could still run an old execd or
something like this since all share the same sge installation via nfs.
> To answer you BDB question, you will find it out looking into the
> bootstrap file (default/common/bootstrap).
> It contains an entry, with spooling_method. (berkeley_db=BDB,
> classic=classic spooling).
neckar ~ % cat /usr/local/sge/default/common/bootstrap
# Version: 6.0u2
I'm really concerned about this "Version: 6.0u2" thing. Also the
"default_domain" might solve the problem I have with dns.
Sebastian Stark -- http://www.kyb.tuebingen.mpg.de/~stark
Max Planck Institute for Biological Cybernetics
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users