[GE users] Gaussian 03, Revision C.02 parallel + SGE

reuti reuti at staff.uni-marburg.de
Sun Sep 27 16:13:14 BST 2009


Hi,

please have a look here:

http://gridengine.sunsource.net/ds/viewMessage.do?dsMessageId=85413&dsForumId=38

You don't need step 3 with SGE 6.0, as long as you attach the PEs each  
to one and onle one queue, as they have already the same name on all  
nodes then.

On all machine you must get the same number of cores.

For more details if necessary please wait until October 5th, unless  
someone else steps in.

-- Reuti


Zitat von sangamesh <forum.san at gmail.com>:

> Dear SGE users,
>
>        I'm facing two issues with Gaussian-Linda, G03 revision C.02 for
> running a parallel job in SGE using Linda.
>
> (1) How to integrate G03.C02 Linda with SGE
>
> The G03.D01 release is the next revision of G03.C02. The G03.D01 provides
> %NProcShared and %LindaWorker to run Linda jobs across multiple nodes under
> shared + distributed memory model. The %LindaWorker can be used to mention
> the list of machines with SGE's $TMPDIR/machines file.
>
>  But G03.C02 doesn't have %LindaWorker directive. As per my understanding,
> it supports only %NProcShared and %NProcLinda. So in absence of
> %LIndaWorker, how to integrate it with SGE i.e how to convey Linda to run
> g03 on SGE's scheduled hosts.
>
> (2) Error Issue: "l302.exel: error while loading shared libraries: util.so:
> cannot open shared object file: No such file or directory
> died without ever signing in Sign in timed out after 0 worker connections.
> Did not reach minimum (1), shutting down."
>
> For running parallel Gaussian jobs in SGE, have configured .tsnet.config as
> follows:
> $ cat ~/.tsnet.config
> Tsnet.Appl.nodelist: compute-0-0.local compute-0-1.local compute-0-2.local
> compute-0-3.local compute-0-4.local compute-0-5.local compute-0-6.local
> compute-0-7.local compute-0-8.local compute-0-9.local compute-0-10.local
> compute-0-11.local compute-0-12.local compute-0-13.local compute-0-14.local
> compute-0-15.local compute-0-16.local compute-0-17.local compute-0-18.local
> compute-0-19.local compute-0-20.local compute-0-21.local compute-0-22.local
>
> Tsnet.Appl.verbose: True
> Tsnet.Appl.veryverbose: True
> Tsnet.Node.lindarsharg: ssh
> Tsnet.Appl.useglobalconfig: false
>
> The Job submit script is as follows:
>
> $ cat sgegausub_1.sh
> #!/bin/bash
>
> #$ -N go3linda
> #$ -S /bin/bash
> #$ -cwd
> #$ -q all.q
> #$ -e err.$JOB_ID.$JOB_NAME
> #$ -o out.$JOB_ID.$JOB_NAME
> g03root=/apps/gaussian-linda
> GAUSS_EXEDIR=/apps/gaussian-linda/g03:/apps/gaussian-linda/g03/linda-exe
> GAUSS_SCRDIR=$HOME/g03scrdir/$JOB_ID
> LD_LIBRARY_PATH=/apps/gaussian-linda/g03:/apps/gaussian-linda/g03/linda-exe:$LD_LIBRARY_PATH
> #$ -v GAUSS_SCRDIR=$HOME/g03scrdir/$JOB_ID
> PATH=$GAUSS_EXEDIR:$PATH
> #$ -v
> LD_LIBRARY_PATH=/apps/gaussian-linda/g03:/apps/gaussian-linda/g03/linda-exe:$LD_LIBRARY_PATH
> export g03root GAUSS_EXEDIR PATH LD_LIBRARY_PATH GAUSS_SCRDIR
> #$ -V
> if [ ! -d $GAUSS_SCRDIR ]; then
> echo "Creating directory $GAUSS_SCRDIR"
> mkdir -p  $GAUSS_SCRDIR
> if [ ! -d $GAUSS_SCRDIR ]; then
> echo "Failed to create $GAUSS_SCRDIR"
>  exit 1
> fi
> fi
> source /apps/gaussian-linda/g03/bsd/g03.profile
> file_orig=/home1/g03/apps_test/gaussian/testprl/test000.com
> PAR_ENV=2
> gjoutfile=
> echo %NProcShared=4  > $file_orig.$JOB_ID
> echo %NProcLinda=`echo $PAR_ENV`  >> $file_orig.$JOB_ID
> cat $file_orig            >> $file_orig.$JOB_ID
> /apps/gaussian-linda/g03/bsd/g03l $file_orig.$JOB_ID
>
> The error its giving is:
> ntsnet: starting master process on compute-0-11.local
> /apps/gaussian-linda/g03/linda7.1/intel-linux2.4-rh8/bin/linda_sh
> /apps/gaussian-linda/g03/linda-exe/l302.exel 0
> /home1/g03/g03scrdir/162/Gau-27587.chk 0
> /home1/g03/g03scrdir/162/Gau-27587.int 0
> /home1/g03/g03scrdir/162/Gau-27587.rwf 0
> /home1/g03/g03scrdir/162/Gau-27587.d2e 0
> /home1/g03/g03scrdir/162/Gau-27587.scr 0
> /home1/g03/g03scrdir/162/Gau-27586.inp 0 junk.out 0 +LARGS 23 0 -kainterval
> 1 -master 17687 -tsnetport 46621 -maxworkers 1 -minworkers 1 -minwait 600
> -maxwait 600 -nodename compute-0-11.local -kaon
> ntsnet: starting 1 worker on compute-0-0.local
> /apps/gaussian-linda/g03/linda7.1/intel-linux2.4-rh8/bin/linda_rsh
> compute-0-0.local -r ssh /apps/gaussian-linda/g03/linda-exe/l302.exel 0
> /home1/g03/g03scrdir/162/Gau-27587.chk 0
> /home1/g03/g03scrdir/162/Gau-27587.int 0
> /home1/g03/g03scrdir/162/Gau-27587.rwf 0
> /home1/g03/g03scrdir/162/Gau-27587.d2e 0
> /home1/g03/g03scrdir/162/Gau-27587.scr 0
> /home1/g03/g03scrdir/162/Gau-27586.inp 0 junk.out 0 +LARGS 23 1 -maxworkers
> 1 -chdir /home1/g03/apps_test/gaussian/testprl -worker
> compute-0-11.local:17687 -workerwait 900 -tsnetref 1 -nodename
> compute-0-0.local
> ntsnet: exec'ing
> /apps/gaussian-linda/g03/linda7.1/intel-linux2.4-rh8/bin/LindaLauncher
> /tmp/162.1.all.q/viaExecDatatFndsC
> /apps/gaussian-linda/g03/linda-exe/l302.exel: error while loading shared
> libraries: util.so: cannot open shared object file: No such file or
> directory
> subprocess pid = 27613 has exited. status = 0x7f00, id = 0, state = 13.
> command was
> /apps/gaussian-linda/g03/linda7.1/intel-linux2.4-rh8/bin/linda_rsh
> compute-0-0.local -r ssh /apps/gaussian-linda/g03/linda-exe/l302.exel 0
> /home1/g03/g03scrdir/162/Gau-27587.chk 0
> /home1/g03/g03scrdir/162/Gau-27587.int 0
> /home1/g03/g03scrdir/162/Gau-27587.rwf 0
> /home1/g03/g03scrdir/162/Gau-27587.d2e 0
> /home1/g03/g03scrdir/162/Gau-27587.scr 0
> /home1/g03/g03scrdir/162/Gau-27586.inp 0 junk.out 0 +LARGS 0
> compute-0-0.local 10.1.1.243 43461 1 1 /home1/g03/apps_test/gaussian/testprl
> died without ever signing in
> Sign in timed out after 0 worker connections.
> Did not reach minimum (1), shutting down.
>
> The error is clear, as it is not able to find util.so shared object file.
> Eventhough LD_LIBRARY_PATH is mentioned, its still not find. The /apps
> directory is storage directory and it is mounted in all the nodes.
>
> What could be the reason for this and how to resolve the issue?
>
> Thanks,
> Sangamesh
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219272
>
> To unsubscribe from this discussion, e-mail:   
> [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219314

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list