[GE users] Problems with multiple network interfaces

jordib jbllistes at gmail.com
Sat Aug 8 10:57:46 BST 2009


I have a problem with dual network interfaces cluster.
The execution nodes have 2 NICS, the first one (eth0) it's for shared data and the second (eth1) it's for compute.
We have 1 node that acts as a sgemaster and 2 more that are submit hosts. All this nodes have only one NIC, and they use static routes in order to contact with the execution hosts.

I've followed the instructions of : http://gridengine.sunsource.net/howto/multi_intrfcs.html

In my host_aliases only appears the execution nodes:
ciqtc01-1       g1node1
ciqtc01-10      g1node10
ciqtc01-11      g1node11
ciqtc01-12      g1node12

The first column it's for eth1 (compute) and the second it's for eth0 (data).
The /etc/hosts includes all this hostnames with it's particular IP.

if I run qstat -f, sge shows this output:
queuename                      qtype resv/used/tot. load_avg arch          states
all.q at g1node1                 BIP   0/0/1          -NA-     -NA-          au

If I run qhost, appear the aliases
sgemaster:~# qhost
global                  -               -     -       -       -       -       -
ciqtc01-1               lx24-amd64      4  0.00    7.8G  472.8M    7.4G     0.0
ciqtc01-10              lx24-amd64      4  0.00    7.8G   71.8M    7.4G     0.0

If I try to see the node configuration, appear the aliases
sgemaster:~# qconf -se g1node1 | grep hostname
hostname              ciqtc01-1

but when I try to submit a job, this is rejected because the alias is unknown for sge at this level.
sgemaster:~# qrsh -q interactive.q at g1node1
Job was rejected because job requests unknown queue "interactive.q at ciqtc01-1"

Any suggestion?

Many thanks,



