[GE users] reresolve hostname failed: can't resolve host name

Chris Dagdigian dag at sonsorol.org
Mon May 5 14:19:14 BST 2008


Hi Sangamesh,

The cause of the error is clear and is entirely DNS or hostname  
related -- Grid Engine is reading the name of the qmaster host from  
the $SGE_ROOT/default/common/act_qmaster file and is then failing to  
do an hostname to IP resolve operation.

First check the contents of act_qmaster and consider adding the proper  
IP to /etc/hosts

Another good test is to use the binaries located in $SGE_ROOT/utilbin/ 
<arch>/ -- there are a number of programs there such as gethostbyname  
and gethostbyaddr -- those binaries are the ones used by SGE to do DNS  
lookups so they are the perfect tool to use when manually testing how  
SGE sees your DNS environment.

-Chris




On May 5, 2008, at 3:02 AM, Sangamesh B wrote:

> Hi all,
>
>      The cluster has three systems. Master, Node and a submit host.
>
> Master and node have dual core dual processor AMD64 opterons, with  
> Rocks 4.3 x86_64 OS.
>
> Submit host: AMD athlon 64 bit processor, with RHEL 3 AS 32 bit  
> version OS.
>
> SGE is installed on Master and node as lx24-amd64, and on submit  
> host I untarred lx24-x86 package and copied the $SGE_ROOT/default   
> from Master node.
>
> The following is the error:
>
> # qstat -f
> reresolve hostname failed: can't resolve host name
>
> I observe that, when the master system boots, only sge_execd starts  
> running default. But sge_qmaster doesn't start.
>
> If sge_qmaster started manually (/etc/init.d/sgemaster start), throws:
> # /etc/init.d/sgemaster start
>
> sge_qmaster didn't start!
> This is not a qmaster host!
> Please, check your act_qmaster file!
>
> Changed content of $SGE_ROOT/default/common/act_qmaster from  
> test.local to test.locuzcluster.org, then manual start of sge_qmaster
> worked out.
>
> On submit host, the act_qmaster was "test.local",  I changed it into  
> test.locuzcluster.org. But still qstat -f throws same error:
>
> reresolve hostname failed: can't resolve host name
>
> Can any one know what might  be causing this error?
>
> Thanks,
> Sangamesh


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list