[GE users] New grid problems

Robert White alphamonk at gmail.com
Mon Jan 28 17:09:52 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I just changed from bash to tcsh and ran my simulation directly. When I run
the simulation from and execute host the sim passes. When I qlogin into the
same execution host the sim fails complaining about a library file.

I redirected the output of env into a file from an execution host that I ssh
directly into and I redirected the output of env into another file when I
qlogin into the exact same execution host. The only environmental differnces
are related to ssh.

Does anyone know of any memory limits that I may be running into? I think
this is a memory limit problem that I am seeing. I have not made any changes
to memory resources. All of my cluster queue limits are set to unlimited.

Thanks Bob

On Jan 25, 2008 10:28 PM, Robert White <alphamonk at gmail.com> wrote:

>
> I have been checking the differences between the two envirnments since
> this morning. I have this small perl script and bash script too. that
> echo's  or prints LD_LIBRARY_PATH, USER, PATH, HOSTNAME, SHELL, PWD, and
> tells what version of perl, gcc, c++, checks the date and time. The only
> difference I see is I use a bash shell script normally and I am getting a
> csh shell from SGE submitted jobs. I have setup .tcshrc to mirror my bashrc
> with the appropriate changes due to the differences in the shell scripts. I
> am gonig to run a job from a execute host in the tcsh shell to see if it
> passes without going to SGE.
>
> I did notice too that my ulimit is difference when I qsub or qrsh submit a
> job. My normal bash ulimit is:
> core file size        (blocks, -c) 0
> data seg size         (kbytes, -d) unlimited
> file size             (blocks, -f) unlimited
> max locked memory     (kbytes, -l) 4
> max memory size       (kbytes, -m) unlimited
> open files                    (-n) 1024
> pipe size          (512 bytes, -p) 8
> stack size            (kbytes, -s) 10240
> cpu time             (seconds, -t) unlimited
> max user processes            (-u) 7168
> virtual memory        (kbytes, -v) unlimited
>
>
> My SGE job submitted csh limit is;
> core file size        (blocks, -c) unlimited
> data seg size         (kbytes, -d) unlimited
> file size             (blocks, -f) unlimited
> max locked memory     (kbytes, -l) 4
> max memory size       (kbytes, -m) unlimited
> open files                    (-n) 1024
> pipe size          (512 bytes, -p) 8
> stack size            (kbytes, -s) unlimited
> cpu time             (seconds, -t) unlimited
> max user processes            (-u) 7168
> virtual memory        (kbytes, -v) unlimited
>
>
>
>
>
> On Jan 25, 2008 9:46 PM, Chris Dagdigian <dag at sonsorol.org> wrote:
>
> > Hi Robert,
> >
> > Is Grid Engine running your job in the same shell that you are using
> > when you run the job via the command line? You can explicitly request
> > a certain shell by adding "-S /bin/sh" or similar to your submission
> > command. Depending on how SGE is configured it may ignoring the first
> > line of your job script (the "#!/bin/something" part ...).
> >
> > One technique I occasionally will use is to create a simple shell
> > script that does nothing but print the current path and ENV variables
> > to the standard output. I then submit that to grid engine and compare
> > the job output to my local path and environment.
> >
> > Regardless, Rayson has you on the right path. These sorts of "jobs run
> > manually but not via SGE" problems are almost always due to shell,
> > environment, path or permission issues.  When you figure out what is
> > "different" about the two environments you'll have the answer to your
> > problem.
> >
> >
> > -Chris
> >
> >
> > On Jan 25, 2008, at 8:18 PM, Robert White wrote:
> >
> > > Hi Rayson,
> > >
> > > This is my library path after sshing directly into sisko.
> > > ##
> > > [robertw at sisko robertw]echo $LD_LIBRARY_PATH
> > > /tools/se/NEC/CSR_be90/ARM_DSM/32bit/latest/
> > > DSM_NBARM926C1616T00P9V10_lic_CB90M_ncverilog_Linux-32_1.0/
> > > simulation_models//ModelManager/MMAPI_5.0.1/Linux/MM/
> > > cadence_nc_verilog:/tools/novas/verdi/latest/share/PLI/systemc/
> > > ncsc53/lib-linux_gcc3_23:/tools/novas/verdi/latest/share/FsdbWriter/
> > > LINUX:/tools/novas/verdi/latest/share/PLI/nc51/LINUX/nc_loadpli1:/
> > > tools/cadence/ius58s3/tools/tbsc/lib/gnu:/tools/cadence/ius58s3/
> > > tools/systemc/gcc/3.2.3/install/lib:.:/tools/cadence/ius58s3//tools/
> > > inca/lib:/tools/cadence/ius58s3//tools/lib:/tools/cadence/ius58s3//
> > > tools/ict/Linux/pli/ncv1_21:/usr/lib:/usr/local/lib:/tools/ActiveTcl/
> > > lib:/tools/denali/denali_3.2.050/verilog:/tools/denali/
> > > denali_3.2.050/ddvapi:/tools/vera/vera-6.3.10-linux2.4.7/lib:/tools/
> > > cadence/ius58s3/tools/systemc/gcc/3.2.3/install/lib
> > > ###
> > >
> > > This is my library path after qlogining into sisko
> > > ###
> > > [robertw at sisko robertw]echo $LD_LIBRARY_PATH
> > > /tools/se/NEC/CSR_be90/ARM_DSM/32bit/latest/
> > > DSM_NBARM926C1616T00P9V10_lic_CB90M_ncverilog_Linux-32_1.0/
> > > simulation_models//ModelManager/MMAPI_5.0.1/Linux/MM/
> > > cadence_nc_verilog:/tools/novas/verdi/latest/share/PLI/systemc/
> > > ncsc53/lib-linux_gcc3_23:/tools/novas/verdi/latest/share/FsdbWriter/
> > > LINUX:/tools/novas/verdi/latest/share/PLI/nc51/LINUX/nc_loadpli1:/
> > > tools/cadence/ius58s3/tools/tbsc/lib/gnu:/tools/cadence/ius58s3/
> > > tools/systemc/gcc/3.2.3/install/lib:.:/tools/cadence/ius58s3//tools/
> > > inca/lib:/tools/cadence/ius58s3//tools/lib:/tools/cadence/ius58s3//
> > > tools/ict/Linux/pli/ncv1_21:/usr/lib:/usr/local/lib:/tools/ActiveTcl/
> > > lib:/tools/denali/denali_3.2.050/verilog:/tools/denali/
> > > denali_3.2.050/ddvapi:/tools/vera/vera-6.3.10-linux2.4.7/lib:/tools/
> > > cadence/ius58s3/tools/systemc/gcc/3.2.3/install/lib
> > > ###
> > >
> > > A diff of the two doesn't show any differences.
> > >
> > > Bob
> > >
> > >
> > > On Jan 25, 2008 6:47 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> > > Can you check the LD_LIBRARY_PATH difference between your shell and
> > > the SGE job environment??
> > >
> > > Rayson
> > >
> > >
> > >
> > > On Jan 25, 2008 7:29 PM, Robert White <alphamonk at gmail.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I have a job script that runs correctly when I run the application
> > > from the
> > > > command line on a execution host without sending the app through
> > > SGE. If I
> > > > run the job through grid either using qrsh, qsub or qlogin, I
> > > receive the
> > > > same error. The error complains about a library file that. This is
> > > the error
> > > > message. "libdenpli.so: failed cannot open shared object file: No
> > > such file
> > > > or directory or file is not valid ELFCLASS32 library"
> > > >
> > > > When I run the command on an execution host and use strace to see
> > > if find
> > > > this library this is what I see.
> > > > open("/tools/denali/denali_3.2.050/verilog/libdenpli.so",
> > > O_RDONLY) = 10
> > > > read(10, "\177ELF
> > > \1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\275"..., 512)
> > > > = 512
> > > > fstat64(10, {st_mode=S_IFREG|0755, st_size=19806385, ...}) = 0
> > > >  old_mmap(NULL, 27237300, PROT_READ|PROT_EXEC, MAP_PRIVATE, 10, 0) =
> > > > 0x32fae000
> > > > old_mmap(0x33fc4000, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|
> > > MAP_FIXED,
> > > > 10, 0x1015000) = 0x33fc4000
> > > > old_mmap(0x340c5000, 9317300, PROT_READ|PROT_WRITE,
> > > > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x340c5000
> > > >  close(10)                               = 0
> > > >
> > > >
> > > >
> > > > When I do a qlogin and ran the command by hand using strace and
> > > found this
> > > > at the point where the library file is being accessed.
> > > > open("/tools/denali/denali_3.2.050/verilog/libdenpli.so",
> > > O_RDONLY) = 10
> > > > read(10, "\177ELF
> > > \1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\275"..., 512)
> > > > = 512
> > > > fstat64(10, {st_mode=S_IFREG|0755, st_size=19806385, ...}) = 0
> > > >  old_mmap(NULL, 27237300, PROT_READ|PROT_EXEC, MAP_PRIVATE, 10, 0)
> > > = -1
> > > > ENOMEM (Cannot allocate memory)
> > > > close(10)                               = 0
> > > >
> > > > Does anyone know what this problem could be? Are there any memory
> > > > limitations that I have someone that I am not aware of. This is a
> > > new
> > > > install of SGE 6.1u2 running on RHEL3.9 computers.
> > > >
> > > > Thanks Bob
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>



More information about the gridengine-users mailing list