[GE users] New grid problems

Robert White alphamonk at gmail.com
Sat Jan 26 04:28:59 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I have been checking the differences between the two envirnments since this
morning. I have this small perl script and bash script too. that echo's  or
prints LD_LIBRARY_PATH, USER, PATH, HOSTNAME, SHELL, PWD, and tells what
version of perl, gcc, c++, checks the date and time. The only difference I
see is I use a bash shell script normally and I am getting a csh shell from
SGE submitted jobs. I have setup .tcshrc to mirror my bashrc with the
appropriate changes due to the differences in the shell scripts. I am gonig
to run a job from a execute host in the tcsh shell to see if it passes
without going to SGE.

I did notice too that my ulimit is difference when I qsub or qrsh submit a
job. My normal bash ulimit is:
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) 4
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited


My SGE job submitted csh limit is;
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) 4
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) unlimited
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited




On Jan 25, 2008 9:46 PM, Chris Dagdigian <dag at sonsorol.org> wrote:

> Hi Robert,
>
> Is Grid Engine running your job in the same shell that you are using
> when you run the job via the command line? You can explicitly request
> a certain shell by adding "-S /bin/sh" or similar to your submission
> command. Depending on how SGE is configured it may ignoring the first
> line of your job script (the "#!/bin/something" part ...).
>
> One technique I occasionally will use is to create a simple shell
> script that does nothing but print the current path and ENV variables
> to the standard output. I then submit that to grid engine and compare
> the job output to my local path and environment.
>
> Regardless, Rayson has you on the right path. These sorts of "jobs run
> manually but not via SGE" problems are almost always due to shell,
> environment, path or permission issues.  When you figure out what is
> "different" about the two environments you'll have the answer to your
> problem.
>
>
> -Chris
>
>
> On Jan 25, 2008, at 8:18 PM, Robert White wrote:
>
> > Hi Rayson,
> >
> > This is my library path after sshing directly into sisko.
> > ##
> > [robertw at sisko robertw]echo $LD_LIBRARY_PATH
> > /tools/se/NEC/CSR_be90/ARM_DSM/32bit/latest/
> > DSM_NBARM926C1616T00P9V10_lic_CB90M_ncverilog_Linux-32_1.0/
> > simulation_models//ModelManager/MMAPI_5.0.1/Linux/MM/
> > cadence_nc_verilog:/tools/novas/verdi/latest/share/PLI/systemc/
> > ncsc53/lib-linux_gcc3_23:/tools/novas/verdi/latest/share/FsdbWriter/
> > LINUX:/tools/novas/verdi/latest/share/PLI/nc51/LINUX/nc_loadpli1:/
> > tools/cadence/ius58s3/tools/tbsc/lib/gnu:/tools/cadence/ius58s3/
> > tools/systemc/gcc/3.2.3/install/lib:.:/tools/cadence/ius58s3//tools/
> > inca/lib:/tools/cadence/ius58s3//tools/lib:/tools/cadence/ius58s3//
> > tools/ict/Linux/pli/ncv1_21:/usr/lib:/usr/local/lib:/tools/ActiveTcl/
> > lib:/tools/denali/denali_3.2.050/verilog:/tools/denali/
> > denali_3.2.050/ddvapi:/tools/vera/vera-6.3.10-linux2.4.7/lib:/tools/
> > cadence/ius58s3/tools/systemc/gcc/3.2.3/install/lib
> > ###
> >
> > This is my library path after qlogining into sisko
> > ###
> > [robertw at sisko robertw]echo $LD_LIBRARY_PATH
> > /tools/se/NEC/CSR_be90/ARM_DSM/32bit/latest/
> > DSM_NBARM926C1616T00P9V10_lic_CB90M_ncverilog_Linux-32_1.0/
> > simulation_models//ModelManager/MMAPI_5.0.1/Linux/MM/
> > cadence_nc_verilog:/tools/novas/verdi/latest/share/PLI/systemc/
> > ncsc53/lib-linux_gcc3_23:/tools/novas/verdi/latest/share/FsdbWriter/
> > LINUX:/tools/novas/verdi/latest/share/PLI/nc51/LINUX/nc_loadpli1:/
> > tools/cadence/ius58s3/tools/tbsc/lib/gnu:/tools/cadence/ius58s3/
> > tools/systemc/gcc/3.2.3/install/lib:.:/tools/cadence/ius58s3//tools/
> > inca/lib:/tools/cadence/ius58s3//tools/lib:/tools/cadence/ius58s3//
> > tools/ict/Linux/pli/ncv1_21:/usr/lib:/usr/local/lib:/tools/ActiveTcl/
> > lib:/tools/denali/denali_3.2.050/verilog:/tools/denali/
> > denali_3.2.050/ddvapi:/tools/vera/vera-6.3.10-linux2.4.7/lib:/tools/
> > cadence/ius58s3/tools/systemc/gcc/3.2.3/install/lib
> > ###
> >
> > A diff of the two doesn't show any differences.
> >
> > Bob
> >
> >
> > On Jan 25, 2008 6:47 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> > Can you check the LD_LIBRARY_PATH difference between your shell and
> > the SGE job environment??
> >
> > Rayson
> >
> >
> >
> > On Jan 25, 2008 7:29 PM, Robert White <alphamonk at gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > I have a job script that runs correctly when I run the application
> > from the
> > > command line on a execution host without sending the app through
> > SGE. If I
> > > run the job through grid either using qrsh, qsub or qlogin, I
> > receive the
> > > same error. The error complains about a library file that. This is
> > the error
> > > message. "libdenpli.so: failed cannot open shared object file: No
> > such file
> > > or directory or file is not valid ELFCLASS32 library"
> > >
> > > When I run the command on an execution host and use strace to see
> > if find
> > > this library this is what I see.
> > > open("/tools/denali/denali_3.2.050/verilog/libdenpli.so",
> > O_RDONLY) = 10
> > > read(10, "\177ELF
> > \1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\275"..., 512)
> > > = 512
> > > fstat64(10, {st_mode=S_IFREG|0755, st_size=19806385, ...}) = 0
> > >  old_mmap(NULL, 27237300, PROT_READ|PROT_EXEC, MAP_PRIVATE, 10, 0) =
> > > 0x32fae000
> > > old_mmap(0x33fc4000, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|
> > MAP_FIXED,
> > > 10, 0x1015000) = 0x33fc4000
> > > old_mmap(0x340c5000, 9317300, PROT_READ|PROT_WRITE,
> > > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x340c5000
> > >  close(10)                               = 0
> > >
> > >
> > >
> > > When I do a qlogin and ran the command by hand using strace and
> > found this
> > > at the point where the library file is being accessed.
> > > open("/tools/denali/denali_3.2.050/verilog/libdenpli.so",
> > O_RDONLY) = 10
> > > read(10, "\177ELF
> > \1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\275"..., 512)
> > > = 512
> > > fstat64(10, {st_mode=S_IFREG|0755, st_size=19806385, ...}) = 0
> > >  old_mmap(NULL, 27237300, PROT_READ|PROT_EXEC, MAP_PRIVATE, 10, 0)
> > = -1
> > > ENOMEM (Cannot allocate memory)
> > > close(10)                               = 0
> > >
> > > Does anyone know what this problem could be? Are there any memory
> > > limitations that I have someone that I am not aware of. This is a
> > new
> > > install of SGE 6.1u2 running on RHEL3.9 computers.
> > >
> > > Thanks Bob
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>



More information about the gridengine-users mailing list