[GE users] LD_LIBRARY_PATH network issues

Andy Schwierskott andy.schwierskott at sun.com
Tue Sep 28 09:32:05 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

John,

> Synopsis: LD_LIBRARY_PATH considered harmful.
>
> I?ve got some observations and questions about SGE?s
> use of LD_LIBRARY_PATH.
>
> First, all of our users source $SGE_ROOT/default/common/setting.sh,
> as expected.  This script sets, or appends to, LD_LIBRARY_PATH.
> On the SGEEE 5.3p6 that we?re using, the only shared library in
> the path added by SGE is libXltree.so.  A little poking around
> shows that this library is only needed by the qmon GUI program.
> I?m guessing that it?s used to render the tree diagrams in qmon.
>
> In our environment we use the shadow master feature of SGE 5.3,
> and we put $SGE_ROOT on a highly available, high-performance NFS
> NAS device.  This allows for the failover to work properly, and we
> also NFS-mount $SGE_ROOT to all the grid nodes.
>
> The problem is this: with LD_LIBRARY_PATH set to
> $SGE_ROOT/lib/glinux or $SGE_ROOT/lib/solaris64 in everyone?s
> environment, *every* single command they run sends NFS packets to
> the NAS device checking for shared libraries.  Every cat, grep, ls,
> cp, etc. sends NFS lookup requests to the NAS device!  In a grid with
> hundreds of systems, this translates into a significant network load,
> and an unnecessary dependency.  To make matters worse, SGE
> automatically sets LD_LIBRARY_PATH in all grid jobs, too.  So if a job
> script is a complex shell script with lots of UNIX commands, and it?s
> submitted as a big job array, one can end up with hundreds of compute
> nodes pounding the NFS server with unnecessary NFS requests.
>
> To top it off, this is all to support one portion of qmon.  Qmon is
> rarely, if ever, used in our environment.
>
> Question:  We?re planning to modify settings.sh and settings.csh to
> simply not set LD_LIBRARY_PATH at all.  But I could not find an
> explicit way to prevent SGE from setting LD_LIBRARY_PATH in jobs.
> Anyone know the best way to do that?

This setting is hard coded in

   libs/sgeobj/sge_var.c:var_list_set_sharedlib_path()

called by daemons/execd/exec_job.c which results that the "environment" file
for the shepherd contains this setting.

> Recommendations: I?d recommend two options to permanently solve
> this problem.
> 1. If libXltree.so is really the only shared library in the whole
> SGE system, just statically link it in to qmon and forget about
> LD_LIBRARY_PATH completely.

There are a bunch of other shared libraries used today (and we might use
more in the future, e.g. put SGE libs in shared libs).

However, I agree that technically it's not really necessary on most OS'es -
e.g. qmaster loads the spooling libraries via dlopen() and dlopen() could
accepts a full path which could be built in the library.

> 2. If there truly is a need for dynamic libraries, make qmon a
> shell script that sets LD_LIBRARY_PATH and then execs the real
> qmon executable. And avoid setting LD_LIBRARY_PATH anywhere else.

or use dlopen() were possible in the code.


> Here is an example of this problem shown on a RH9 Linux system:
>
> rh9% strace -eopen /bin/cat /dev/null
> open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/i686/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such
> file or directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/tls/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/i686/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/xxxxxxxxxx/sgeee_5.3/lib/glinux/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/etc/ld.so.cache", O_RDONLY)      = 3
> open("/lib/tls/libc.so.6", O_RDONLY)    = 3
> open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
> open("/dev/null", O_RDONLY|O_LARGEFILE) = 3
> rh9%

I see the point but I also see that Linux has a very complex logic here.

Are there some means of registering the SGE shared libs with the system that
it gets registered in /etc/ld.so.cache to get rid of.

You points are valid and we should discuss a doable solution - it's similar
e.g. on Solaris - every path in LD_LIBRARY_PATH is checked.

% truss -tstat ls

stat("/usr/bin/ls", 0xFFBFE210)                 = 0
stat("/xxxxx/InhouseSystems/n1ge6u1/lib/sol-sparc64/libc_ut.so",
0xFFBFDBAC) Err#2 ENOENT
stat("/usr/dt/lib/libc_ut.so", 0xFFBFDBAC)      Err#2 ENOENT
stat("/usr/openwin/lib/libc_ut.so", 0xFFBFDBAC) Err#2 ENOENT
stat("/usr/lib/libc_ut.so", 0xFFBFDBAC)         = 0
stat("/xxxxx/InhouseSystems/n1ge6u1/lib/sol-sparc64/libc.so.1",
0xFFBFDB54) Err#2 ENOENT
stat("/usr/dt/lib/libc.so.1", 0xFFBFDB54)       Err#2 ENOENT
stat("/usr/openwin/lib/libc.so.1", 0xFFBFDB54)  Err#2 ENOENT
stat("/usr/lib/libc.so.1", 0xFFBFDB54)          = 0
stat("/xxxxx/InhouseSystems/n1ge6u1/lib/sol-sparc64/libdl.so.1",
0xFFBFDB54) Err#2 ENOENT
stat("/usr/dt/lib/libdl.so.1", 0xFFBFDB54)      Err#2 ENOENT
stat("/usr/openwin/lib/libdl.so.1", 0xFFBFDB54) Err#2 ENOENT
stat("/usr/lib/libdl.so.1", 0xFFBFDB54)         = 0
stat("/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1", 0xFFBFD914) = 0


I think it's definitley worth to check for a longterm solution.

> =====
> --
> John Saalwaechter <bababooey182 at yahoo.com>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - You care about security. So do we.
> http://promotions.yahoo.com/new_mail
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


Regards,
Mit freundlichen Gruessen,
Andy
Schwierskott

--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Andy Schwierskott           Tel: +49 (0)941 3075-200 (x60200)
N1 Grid Engine Engineering  Fax: +49 (0)941 3075-222 (x60222)
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7       mailto:andy.schwierskott at sun.com
D-93049 Regensburg          http://www.sun.com/gridware


    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list