[GE users] LD_LIBRARY_PATH network issues

John Saalwaechter bababooey182 at yahoo.com
Tue Sep 28 05:13:11 BST 2004


Synopsis: LD_LIBRARY_PATH considered harmful.

I?ve got some observations and questions about SGE?s
use of LD_LIBRARY_PATH.

First, all of our users source $SGE_ROOT/default/common/setting.sh,
as expected.  This script sets, or appends to, LD_LIBRARY_PATH.
On the SGEEE 5.3p6 that we?re using, the only shared library in
the path added by SGE is libXltree.so.  A little poking around
shows that this library is only needed by the qmon GUI program.
I?m guessing that it?s used to render the tree diagrams in qmon.

In our environment we use the shadow master feature of SGE 5.3,
and we put $SGE_ROOT on a highly available, high-performance NFS
NAS device.  This allows for the failover to work properly, and we
also NFS-mount $SGE_ROOT to all the grid nodes.

The problem is this: with LD_LIBRARY_PATH set to
$SGE_ROOT/lib/glinux or $SGE_ROOT/lib/solaris64 in everyone?s
environment, *every* single command they run sends NFS packets to
the NAS device checking for shared libraries.  Every cat, grep, ls,
cp, etc. sends NFS lookup requests to the NAS device!  In a grid with
hundreds of systems, this translates into a significant network load,
and an unnecessary dependency.  To make matters worse, SGE
automatically sets LD_LIBRARY_PATH in all grid jobs, too.  So if a job
script is a complex shell script with lots of UNIX commands, and it?s
submitted as a big job array, one can end up with hundreds of compute
nodes pounding the NFS server with unnecessary NFS requests.

To top it off, this is all to support one portion of qmon.  Qmon is
rarely, if ever, used in our environment.

Question:  We?re planning to modify settings.sh and settings.csh to
simply not set LD_LIBRARY_PATH at all.  But I could not find an
explicit way to prevent SGE from setting LD_LIBRARY_PATH in jobs.
Anyone know the best way to do that?

Recommendations: I?d recommend two options to permanently solve
this problem.
1. If libXltree.so is really the only shared library in the whole
SGE system, just statically link it in to qmon and forget about
LD_LIBRARY_PATH completely.
2. If there truly is a need for dynamic libraries, make qmon a
shell script that sets LD_LIBRARY_PATH and then execs the real
qmon executable. And avoid setting LD_LIBRARY_PATH anywhere else.

Here is an example of this problem shown on a RH9 Linux system:

rh9% strace -eopen /bin/cat /dev/null
open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/i686/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such
file or directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/i686/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/lib/tls/libc.so.6", O_RDONLY)    = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
open("/dev/null", O_RDONLY|O_LARGEFILE) = 3
rh9%


=====
--
John Saalwaechter <bababooey182 at yahoo.com>


		
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list