[GE users] possible qstat problem with 6.0u7?

Marco Donauer Marco.Donauer at Sun.COM
Thu Dec 22 15:08:02 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Sebastian,

it seems that also ran into a know issue, which is already fix in the 
maintrunc. (issue: 1940)
This issue describes the problem, when the host is resolved with long 
hostnames, but the -q option filters
for short hostname, no output will be created.
I guess that, after switching for file resolving (/etc/hosts) to DNS 
resolving long hostnames will be
returned.

You can check this easily, if you execute the 
$SGE_ROOT/utilbin/<arch>/gethostname binary on this host, where
the qstat -F .. -q ... didn't work.
Is the output of gethostname a long hostname and the qstat -F ... -q 
"short hostname", you won't get an output.

The workaround for you: Make sure the hostname resolving returns the 
hostname in the same way as it is shown in
the qstat -f output, then it should work.

Concerning the qstat -j problem I got nothing new. Sorry.

The bootstrap file is no problem, I guess that your first installation 
was a u2?
The bootstrap file was created there and I will be never changed.
The version number is always the same!

Regards,
Marco

Sebastian Stark wrote:

>On Tuesday 20 December 2005 09:45, Marco Donauer - SUN Microsystems wrote:
>  
>
>>I talked to our communication guru, and this is no problem. This message
>>appears, if a client
>>is stopped with control c or a kill.
>>    
>>
>
>Yes, it sometimes happens if people are doing "qlogin -now no" and are tired 
>of waiting for a slot after an hour or so. They hit ctrl-c and the qlogin 
>process goes away. Unfortunately the job does not disappear from the qstat 
>listing. But that's another problem.
>
>  
>
>>>>Currently I'm not able to reproduce the both errors. The qstat -F .....
>>>>-q .... is working
>>>>        
>>>>
>
>The "qstat -F ... -q all.q at node1" problem is a bit clearer now:
>
>If I add the node1 to the /etc/hosts file it works. It does not work if the 
>only way to resolve the hosts name is DNS however. Most notably this means it 
>does not necessarily have to do with the u4->u7 transition because I also 
>switched from /etc/hosts to DNS after the upgrade.
>
>  
>
>>>>and the qstat -j is working too.
>>>>        
>>>>
>
>Problem with qstat -j still exists, even if I add all hosts to /etc/hosts. So 
>those problems do not seem to be related to each other.
>
>  
>
>>Hm I don't know. I don't thinks that a faulty memory is the reason.
>>You're talking about a high load.
>>Is this load on the nfs also?  In this caes the connection to the master
>>host could be lost.
>>    
>>
>
>The nfs load is high sometimes, yes. For this reason we use a Solaris 
>fileserver :) I never noticed nfs connection problems, if there were any, I'm 
>sure the users would complain (as their homes would stop working)
>
>  
>
>>One other question, did you do an upgrade from u4 to u7 or is this a
>>complet new installation with u7?
>>    
>>
>
>I upgraded exactly like the upgrade manual said. I also did the backup and bdb 
>upgrade step.
>
>  
>
>>In case of an upgrade, are you really sure, that all binaries and libs
>>are upgraded eg. local binaries or something else?
>>    
>>
>
>All binaries and shared libraries in the lib/, bin/ and utilbin/ directories 
>carry the date "Dec  9 13:41", so they were updated for sure. I also checked 
>the uptimes of all nodes, there's no way they could still run an old execd or 
>something like this since all share the same sge installation via nfs.
>
>  
>
>>To answer you BDB question, you will find it out looking into the
>>bootstrap file (default/common/bootstrap).
>>It contains an entry, with spooling_method. (berkeley_db=BDB,
>>classic=classic spooling).
>>    
>>
>
>Hmm:
>
>neckar ~ % cat /usr/local/sge/default/common/bootstrap
># Version: 6.0u2
>#
>admin_user             sge
>default_domain          none
>ignore_fqdn             true
>spooling_method         berkeleydb
>spooling_lib            libspoolb
>spooling_params         /usr/local/sge/default/spool/spooldb
>binary_path             /usr/local/sge/bin
>qmaster_spool_dir       /usr/local/sge/default/spool/qmaster
>security_mode           none
>
>I'm really concerned about this "Version: 6.0u2" thing. Also the 
>"default_domain" might solve the problem I have with dns.
>
>
>Thank you.
>
>
>-Sebastian
>
>  
>

-- 

Marco Donauer            Tel: +49 941 3075-211  (x60211)
Software Engineer        Fax: +49 941 3075-222  (x60222)
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7    mailto:marco.donauer at sun.com
D-93049 Regensburg       http://www.sun.com/gridware

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list