[GE users] Getting mvapich tight integration working

Andy Schwierskott andy.schwierskott at sun.com
Tue Aug 1 15:54:13 BST 2006


Hi,

The SGE host aliasing is not desinged to help with reverse mapping issue. So
you need to make sure that the reverse mapping on all hosts is working.

The host aliases file of SGE is desinged to enforce a different primary name
for a host of the original name is not the one which can be used that all
components can talk to each other.

Andy

> Greetings,
>
> Thanks for your help!
>
> On Tue, 1 Aug 2006, Christian.Reissmann at Sun.COM wrote:
>
>> It seems that the host "compute-0-19.local" can't resolve the ip address of 
>> host "compute-0-22.local". Can you please check the ip resolving on both 
>> hosts, so that each host can resolve the ip adressess of each other.
>
> [akrherz at compute-0-19 ~]$ /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 
> -all 192.168.0.232
> error resolving ip "192.168.0.232": can't resolve ip address (h_errno = 
> HOST_NOT_FOUND)
>
>
> [akrherz at compute-0-22 ~]$ /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 
> -all 192.168.0.235
> error resolving ip "192.168.0.235": can't resolve ip address (h_errno = 
> HOST_NOT_FOUND)
>
>
> So I tried the /opt/gridengine/default/common/host_aliases file again and 
> restarted sgemaster and sgeexecd everywhere
>
> [akrherz at compute-0-19 common]$ grep  192.168.0.232 host_aliases
> compute-0-22.local 192.168.0.232
> compute-0-22 192.168.0.232
> c0-22 192.168.0.232
>
> [akrherz at compute-0-19 common]$ 
> /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr -all 192.168.0.232
> error resolving ip "192.168.0.232": can't resolve ip address (h_errno = 
> HOST_NOT_FOUND)
>
> So then I tried adding all of the compute nodes to /etc/hosts on all the 
> cluster nodes and it will resolve
>
> [root at compute-0-19 ~]# /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr -all 
> 192.168.0.232
> Hostname: compute-0-22.local
> SGE name: compute-0-22.local
> Aliases:  compute-0-22
> Host Address(es): 192.168.0.232
>
> So I tried my test code again and got fewer errors, but still not working.
>
> http://mesonet.agron.iastate.edu/pickup/mvapich_debug2.txt.gz
>
> The firewall is off on the cluster nodes.
>
> Ideas?  thanks!
>  daryl
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list