[GE users] Two networks cards in qmaster with SGE 6.0 problem

christian reissmann christian.reissmann at sun.com
Wed Jun 30 08:38:19 BST 2004


Reuti wrote:
> Christian,
> 
>> you can try to use the ip adress of your qmaster:
>>
>> qping x.x.x.x [qmaster port nr.] qmaster 1
> 
> 
> with neither of the TCP/IP addresses it's working on the master itself 
> (after using the patch of act_qmaster during startup).
> 
> Can you use qping only on the primary (in our case: external) side of 
> the network? And when you patch the act_qmaster file some cached entries 
> in qmasterd are no longer valid, so it gets confused (at least for 
> qpings)? Maybe it's best, first to get the issue with the changed 
> act_qmaster file during startup resolved, and then to have a look again 
> at qping.

qping should work, because it will only connect to the given tcp/ip address
( or the ip address returned by gethostbyaddr() from the local host name
resolving setup) and specified port address. If an application (qmaster) 
which
uses commlib has bound this port the qping command should get an answer.

Can you please try to make a telnet to the qmaster:

 >telnet [YOUR UP AND RUNNING QMASTER HOSTNAME] [YOUR QMASTER PORT]

The expected output should be:
Trying [IP ADDRESS OF QMASTER HOST]...
Connected to [QMASTER HOST NAME].
Escape character is '^]'.
[After a timeout (1 minute) the qmaster should close this connection and 
you should get the
next line]
Connection closed by foreign host.
 >

If you get something like

"Unable to connect to remote host: Connection refused"

the qmaster application as not bound the qmaster port.





If you want to debug the communication messages you can do it in the 
following way:

- first shutdown your qmaster daemon if running
- switch to root on your qmaster host
- go to SGE_ROOT directory and source the file 
"$SGE_ROOT/default/common/settings.(c)sh
- source $SGE_ROOT/util/dl.(c)sh
- type "dl 1" to enable debug mode of qmaster
- set the environment variable SGE_COMMLIB_DEBUG to a value between 1 
and 4:
     e.g.: setenv SGE_COMMLIB_DEBUG 2
- start the qmaster:
     $SGE_ROOT/bin/[YOUR ARCHITECTURE]/sge_qmaster


If this is too much output it is possible to only switch on commlib 
debuging:

- first shutdown your qmaster daemon if running
- switch to root on your qmaster host
- go to SGE_ROOT directory and source the file 
"$SGE_ROOT/default/common/settings.(c)sh
- source $SGE_ROOT/util/dl.(c)sh
- type "dl 0" to disable debug mode of qmaster
- set the environment varialbe "SGE_ND" to disable that qmaster is 
daemonizing:
     e.g.: setenv SGE_ND 1
- set the environment variable SGE_COMMLIB_DEBUG   to a value between 1 
and 4:
     e.g.: setenv SGE_COMMLIB_DEBUG 2
- start the qmaster:
     $SGE_ROOT/bin/[YOUR ARCHITECTURE]/sge_qmaster


NOTE:
======
The commlib debug info (set with SGE_COMMLIB_DEBUG) has to be overworked,
because not all messages displayed as "error:" are realy errors. But it 
should print out
some interesting messages.


-- 
Christian Reissmann    Tel: +49 (0)941 3075 112  mailto:crei at sun.com
Software Engineer      Fax: +49 (0)941 3075 222 
http://www.sun.com/gridengine
Sun Microsystems GmbH, Dr.-Leo-Ritter-Str. 7,
D-93049 Regensburg,    Tel: +49 (0)941 3075 0


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list