[GE users] run time intel compiler library libsvml not found

Reuti reuti at staff.uni-marburg.de
Fri Jan 18 10:40:10 GMT 2008


Am 18.01.2008 um 00:29 schrieb SLIM H.A.:

> I am using mpich1 over ethernet here. job_is_first_task is  FALSE  
> and that gives me n-1 instances of the qrsh on the master node.  
> This has been the setup all the time. If I change
> job_is_first_task  to TRUE the job crashes. This behaviour  
> contradicts the section "Number of tasks spread to the nodes". The  
> device is ch_p4.

Then there are more options:

- What application is it? E.g. Turbomole need always one process more  
than the user wants to use.
- Are you using -nolocal to mpirun?
- Can you please post the relevant lines of a `ps -e f`(blank between  
-e and f) and post it of the master node.
- The job crashes with what type of failure, i.e. error message?

-- Reuti


> Thanks
> Henk
>
> From: Reuti [mailto:reuti at Staff.Uni-Marburg.DE]
> Sent: Thu 1/17/2008 6:36 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] run time intel compiler library libsvml not  
> found
>
> Hi,
>
> Am 17.01.2008 um 18:29 schrieb SLIM H.A.:
>
> > Apologies for the long delay to reply. I checked the web page you
> > referred to and the -V option solves the problem, thanks.
> > However I noticed something curious: we use standard MPICH over
> > ethernet
> > with sge/mpi/startmpi.sh -catch_rsh $pe_hostfile as the PE start
> > script.
> > If I set
> >
> > job_is_first_task TRUE
>
> this will just adjust the number of allowed qrsh calls under control
> of SGE, whether it will be "n" (job_is_first_task FALSE) or
> "n-1" (job_is_first_task TRUE).
>
> Are you using plain MPICH(1) on a) Ethernet or b) on Myrinet?
>
> -- Reuti
>
>
> > in the definition of the PE, as sugested on the web page then MPICH
> > generates error messages. I do have to set
> >
> > control_slaves    TRUE
> > job_is_first_task FALSE
> >
> > to get it to work. Why should this be?
> >
> > Thanks
> >
> > Henk
> >
> >>
> >> Aha, the slave task might not have the LD_LIBRARY_PATH.
> >> Please add a - V to the rsh wrapper:
> >>
> >> http://gridengine.sunsource.net/howto/mpich-integration.html
> >>
> >> which will also solve other issues. And be sure to have a
> >> Tight Integration, i.e. "setenv P4_RSHCOMMAND rsh" to use the
> >> rsh-wrapper.
> >>
> >> -- Reuti
> >>
> >
> >> -----Original Message-----
> >> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> >> Sent: 21 December 2007 13:36
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] run time intel compiler library
> >> libsvml not found
> >>
> >> Am 21.12.2007 um 13:04 schrieb SLIM H.A.:
> >>
> >>> Maybe it clarifies if I show the script:
> >>>
> >>> #!/bin/csh
> >>> ... some standard sge options here
> >>> #$ -cwd
> >>> setenv MPICH_PROCESS_GROUP no
> >>> # request submission to a queue for parallel jobs #$ -q
> >> par.q ##$ -S
> >>> /bin/csh
> >>
> >> This will be just a real comment, it's not #$ at the beginning.
> >>
> >>> #   ^^ no effect
> >>> # set up the mpich version to use
> >>> # load the modules
> >>> module purge
> >>> module load intel/fce/9.0.032 mpich/ge/intel/64/1.2.7
> >> sge/6.0u7_1 ldd
> >>> ./monte echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH # $ -v
> >>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH
> >>
> >> This you can only use on the commandline, where
> >> $LD_LIBRARY_PATH will be expanded by the shell. Here you
> >> should see a literal $LD_LIBRARY_PATH echoed, unless -V is
> >> used (space between # and $ is also not allowed)
> >>
> >>> #   ^^ no effect
> >>> #$ -V
> >>> #   ^^ only works if the session shell has the module loaded as  
> well
> >>
> >> Seems okay.
> >>
> >>> # execute command
> >>> mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./monte
> >>>
> >>> I built monte with
> >>>
> >>> module purge
> >>> module load intel/fce/9.0.032 mpich/ge/intel/64/1.2.7
> >> mpif90 monte.f90
> >>> -o monte
> >>>
> >>> These are snippets from the output file ...
> >>>         libsvml.so =>
> >>> /usr/local/Cluster-Apps/intel/fce/9.0//lib/libsvml.so
> >>> (0x00002b21417de000)
> >>> ...
> >>> LD_LIBRARY_PATH=/usr/local/lib:/usr/X11R6/lib:/usr/local/Cluster-
> >>> Apps/in
> >>> tel/fce/9.0//lib:/usr/local/Cluster-App
> >>> s/mpich/ge/intel/64/1.2.7/lib/shared:/usr/local/Cluster-Apps/sge/
> >>> lib/lx2
> >>> 6-amd64
> >>> /usr/local/Cluster-Apps/sge/bin/lx24-amd64/qrsh -inherit -nostdin
> >>> node231 /data/hamilton/drk1has/hamilton_monte
> >>> pi/amd64_lnx_ifort/./monte node231 50375 \-p4amslave \-p4yourname
> >>> node231 \-p4rmrank 1
> >>>
> >> /data/hamilton/drk1has/hamilton_montepi/amd64_lnx_ifort/./mont
> >> e: error
> >>> while loading shared libraries: libsvml.
> >>> so: cannot open shared object file: No such file or directory ...
> >>
> >> Aha, the slave task might not have the LD_LIBRARY_PATH.
> >> Please add a - V to the rsh wrapper:
> >>
> >> http://gridengine.sunsource.net/howto/mpich-integration.html
> >>
> >> which will also solve other issues. And be sure to have a
> >> Tight Integration, i.e. "setenv P4_RSHCOMMAND rsh" to use the
> >> rsh-wrapper.
> >>
> >> -- Reuti
> >>
> >>  
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users- 
> help at gridengine.sunsource.net
> >>
> >>
> >
> >  
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>




More information about the gridengine-users mailing list