[GE users] parallel jobs submission help

Wheeler, Dr M.D. mdw10 at leicester.ac.uk
Thu Nov 25 21:09:21 GMT 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

if I add the lines
echo $TMPDIR/machines
more $TMPDIR/machines
 
to my script I get the following output
 
/tmp/60.1.compute-0-0.q/machines
::::::::::::::
/tmp/60.1.compute-0-0.q/machines
::::::::::::::
compute-0-0
compute-0-0
 
does this mean that I only have one node registered on my SGE config
 
Cheers
Martyn
 

________________________________

From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Thu 25/11/2004 18:50
To: users at gridengine.sunsource.net
Subject: Re: [GE users] parallel jobs submission help



Hi,

> I have SGE installed on my rocks cluster.  I ahve also installed some
> computational chemistry software called molpro.  If I want to run this
> program interactively on 2 CPU i would simply issue the command
>
> molpro -n2 file.com
>
> where file.com is my input file, after doing this and looking at top I see
> the line
>
>  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 3943 root      25   0 49540  48M  5604 R    199.9  0.8   0:09   0
> molprop_2002

you have to set up a parallel environment in SGE to get it proper working with
parallel jobs. See "man sge_pe".

> indicating that the process is running on 2 CPU.
>
> I now want to emulate this using SGE so that I can launch jobs from my
> frontend node to the various compute nodes.
>
> I have a script:
>
> $ cat molpro.sh
> #!/bin/bash
> #
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
>
> molpro -n2 test.com

molpro -n $NSLOTS test.com for integration with SGE

Well, we also just bought a parallel license for Molpro and I wonder, whether
you need a shared scratch space to all the nodes for parallel jobs. Because it
will go all to the file server in this case, maybe to set some nodes aside and
using PVFS (a parallel file-server) to speed up the things. Anyway, because you
have only one variable which is set by default to be $SCRATCH in Molpro, you
must set SCRATCH=$TMPDIR. In the second step (for multinode jobs) you can add
to startmpi.sh a loop to create identical named scratch directories on all
nodes (if this is working at all this way with Molpro).

The other not well documented thing is, that Molpro expects the name of the
nodes in $PBS_NODEFILE, so you have to set it also to point to the machinefile
i.e. PBS_NODEFILE=$TMPDIR/machines and have one line per task on each node in
this file.

Okay, first step is to get it working on one node, and I would also suggest to 
start with the mpi PE example. For the setup of the PE I intend to use
"allocation rule" of 2. This way you can use the cleanipcs script from e.g.
mpich to remove some shared memory segments which maybe still there after the
end of the job, without killing other jobs from the same user on the node.

Sorry, many things at once. And be assured: it's not a native question or setup
to get it working in the correct way. BTW: are you using rsh or ssh between the
nodes?

Cheers - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net






    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list