[GE users] sge 6 on hosts with multiple network network interfaces

Chris Dagdigian dag at sonsorol.org
Wed Mar 1 11:24:09 GMT 2006


The solution to multiple NICs is the SGE host_aliases file - you can  
alias whatever SGE resolves to be the hostname to whatever other  
hostname or IP you want. But before you do that you need to figure  
out why there is a hostname mismatch between act_qmaster and SGE when  
it starts up.

To specifically debug the error message below, I'd recommend the  
following:

o carefully check forward and reverse DNS resolution to make sure  
things match (SGE is very sensitive to DNS issues)
o carefully check /etc/hosts for typos or mismatches with what DNS says
o Ideally, your SGE qmaster should have a fully qualified hostname in  
DNS that also reverse resolves when it's IP address is queried

Then --

Run the command "hostname" to see what your host thinks its core  
hostname is

Run the command $SGE_ROOT/$SGE_CELL/utilbin/<$ARCH>/gethostname to  
see what SGE thinks the hostname is

Basically you should spend some time running the hostname related  
binaries in the SGE utilbin/<arch>/ directory -- that will show you  
what SGE thinks the hostname is and should make the act_qmaster error  
much clearer. Once you know where/how the mismatch is occurring you  
can fix the root cause (if there is one) or get more control over  
what you want via the SGE host_aliases file.


 From the man page:

> NAME
>      host_aliases - Grid Engine host aliases file format
>
> DESCRIPTION
>      All Grid Engine components use a hostname resolving  service
>      provided  by the communication library to identify hosts via
>      a unique hostname. The communication library  itself  refer-
>      ences  standard UNIX directory services such as DNS, NIS and
>      /etc/hosts to resolve hostnames. In rare cases  these  stan-
>      dard  services  cannot be setup cleanly and Grid Engine com-
>      munication daemons running on different hosts are unable  to
>      automatically  determine  a  unique  hostname for one or all
>      hosts which can be used on all hosts. In such  situations  a
>      Grid  Engine  host  aliases  file can be used to provide the
>      communication daemons with a private and consistent hostname
>      resolution database.
>
>      The   location   for    the    host    aliases    file    is
>      <sge_root>/<cell>/common/host_aliases.
>
> FORMAT
>      For each host a single line must be provided with  a  blank,
>      comma  or  semicolon separated list of hostname aliases. The
>      first alias is defined to be the unique hostname which  will
>      be  used  by  all  Grid Engine components using the hostname
>      aliasing service of the communication library.
>



Regards,
Chris





On Mar 1, 2006, at 4:11 AM, Pieter Kroon wrote:

> I cant't get sge6 to work on hosts with multiple network network  
> interfaces.
>
> For sge5.3 there is a nice (working)howto: http:// 
> gridengine.sunsource.net/project/gridengine/howto/multi_intrfcs.html
>
>
> On sge6 when following the howto, after editing the SGE_ROOT/ 
> SGE_CELL/common/act_qmaster file and restarting the daemons i get  
> following error:
>
> sge_qmaster didn't start!
> This is not a qmaster host!
> Please, check your act_qmaster file!
>
> Did anyone solve this problem?
>
>
> Pieter Kroon
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list