[GE users] How to install submit host for SGE 6.0u3

Pacey, Mike m.pacey at lancaster.ac.uk
Thu May 5 14:15:06 BST 2005


Problem solved! Trying to run 'qsub' from my new submit host finally
gave the hint I needed:

"Unable to run job: denied: client (<FQDN>/qsub/47) uses newer GDI
version   
 268439552 while qmaster uses older version 6.0u1."

I'd thought the qmaster was using u3! Given how useful that error
message is, can its lack of appearance in qstat be listed as a bug? It
would certainly help other unforunates who inadvertantly mix and match
patch levels.

I've installed the u3 update on my qmaster and execd hosts, and it's now
working fine. A couple of quirks I found along the way:

Section 8 of the patch instructions at
http://gridengine.sunsource.net/project/gridengine/install60patch.txt
says to type in the instructions:

# /etc/init.d/sgemaster etc/init.d/sgeexecd

I believe this should be:

# /etc/init.d/sgemaster 
# /etc/init.d/sgeexecd

or possibly just:

# /etc/init.d/sgemaster start

Also, my sgemaster failed to start (properly), with the spool messages
file showing that the problem was an sge-specific 64-bit library linking
problem. I fixed this by adding the following lines to the sgemaster and
sgeexecd startup scripts, immediately after the line which sets up the
SGE_CELL env var, which correctly setup the 64-bit library search path:

if test -z "${LD_LIBRARY_PATH_64}"; then
  LD_LIBRARY_PATH_64=${SGE_ROOT}/lib/sol-sparc64
else
  LD_LIBRARY_PATH_64="${SGE_ROOT}/lib/sol-sparc64:${LD_LIBRARY_PATH_64}"
fi
export LD_LIBRARY_PATH_64

Regards,
Mike.

-----

Dr Mike Pacey,                         Email: M.Pacey at lancaster.ac.uk
High Performance Systems Support,      Phone: 01524 593543
Information Systems Services,            Fax: 01524 594459
Lancaster University,
Lancaster LA1 4YW

-----

Content-Type: text/plain;
	charset="us-ascii"
Date: Tue, 3 May 2005 16:01:47 +0100
From: Pacey, Mike <m.pacey at lancaster.ac.uk>
Subject: [GE users] How to install submit host for SGE 6.0u3


Rayson,

Thanks for the suggestions - coming from 5.2.2, I'd not come across
qping. The tests seem to work fine:

./bin/lx24-x86/qping -info <FQDN> 538 qmaster 1
05/03/2005 15:47:07:
SIRM version:             0.1
SIRM message id:          1
start time:               01/26/2005 15:34:30 (1106753670)
run time [s]:             8377957
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 93
status:                   0
info:                     ok

gethost(by)(name|addr) look like they're working too, though the addr
version returns the comma-separated list of names for the qmaster node
from the local /etc/hosts.

I take it my actual install process is valid - ie, untarring the common
and binaries files, and then NFS-mounting the <CELL>/common directory as
read-only?

Regards,
Mike.

-----

From: Rayson Ho <raysonho at eseenet.com>
Date: Thu, 28 Apr 2005 09:37:06 PST
Content-Type: text/plain; charset="iso-8859-1"
Subject: [GE users] How to install submit host for SGE 6.0u3


Can be hostname resolution problems... Can qping access the qmaster from
that host?? Also, use gethostbyname/gethostname/gethostbyaddr to check
whether name resolution is setup correctly.

Rayson


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list