[GE users] Getting mvapich tight integration working

Christian.Reissmann at Sun.COM Christian.Reissmann at Sun.COM
Tue Aug 1 09:20:28 BST 2006


 From the logging output I found out:

=> remote resolved component host name (local host) : compute-0-22.local
=> local resolved component host name (local host) : compute-0-22.local
=> remote resolved component host name (receiver host) : derecho.agron.iastate.edu
=> local resolved component host name (receiver host) : derecho.agron.iastate.edu
=> remote resolved component host name (sender host) : compute-0-22.local
=> local resolved component host name (sender host) : compute-0-22.local
=> requested local component id from server is 401
=> requested sender component id from server is 401

Local host: compute-0-22.local
master host: derecho.agron.iastate.edu  (derecho.agron.iastate.edu/qmaster/1)

=> try to find handle for execd_handle
=> ignoring component_id
=> new message for:       "compute-0-19/execd/1"

execd host for qrsh -inherit: "compute-0-19.local/execd/1"

=> Connect Error: 3
=> error: client IP resolved to host name "". This is not identical to clients host name ""
=> add application error id:  access denied
=> add application error:  client IP resolved to host name "". This is not identical to clients host 
name ""
access denied (client IP resolved to host name "". This is not identical to clients host name "")
=> deleting unsend message for connection
=> can't find connection to: compute-0-19.local
=> Connect Error: 3
=> error: client IP resolved to host name "". This is not identical to clients host name ""
=> add application error id:  access denied
=> add application error:  client IP resolved to host name "". This is not identical to clients host 
name ""
=> deleting unsend message for connection
=> can't find connection to: compute-0-19.local
error: executing task of job 271 failed: failed sending task to execd at compute-0-19: can't find 
connection


It seems that the host "compute-0-19.local" can't resolve the ip address of host 
"compute-0-22.local". Can you please check the ip resolving on both hosts, so that
each host can resolve the ip adressess of each other.


Regards,

Christian



Daryl Herzmann wrote On 07/31/06 15:57,:
> Hi!
> 
> Thanks for your help!
> 
> On Mon, 31 Jul 2006, Christian.Reissmann at Sun.COM wrote:
> 
>> Is it guaranteed that all submit hosts (qrsh -inherit), execd hosts 
>> can resolve each other. The execution deamon has to be able to resolve 
>> the ip adresses of the qrsh -inherit clients. You can check this with the
>>
>> gethostbyaddr -all x.x.x.x
> 
> 
> Appears to be okay, for example...
> 
> $ /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr -all 192.168.0.254
> Hostname: compute-0-0.local
> SGE name: compute-0-0.local
> Aliases:  compute-0-0
> Host Address(es): 192.168.0.254
> 
>> call on the execd hosts. The other gethostby... calls may also be 
>> helpful.
> 
> 
> They appear to be just fine.
> 
>> You can also set the envirionment variable SGE_COMMLIB_DEBUG before 
>> starting the qrsh command ...
>>
>> setenv SGE_COMMLIB_DEBUG 3
> 
> 
> I ran my test enabling this and got a lot of messages.
> 
> http://mesonet.agron.iastate.edu/pickup/mvapich_debug.txt.gz
> 
> Perhaps somebody smarter than me can decipher them!
> 
> thanks :)
>    daryl
> 

-- 
Christian Reissmann    Tel: +49 (0)941 3075 112  mailto:crei at sun.com
Software Engineer      Fax: +49 (0)941 3075 222  http://www.sun.com/gridengine
Sun Microsystems GmbH, Dr.-Leo-Ritter-Str. 7,
D-93049 Regensburg,    Tel: +49 (0)941 3075 0

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list