[GE users] MPI Job Execution

Reuti reuti at staff.uni-marburg.de
Sat Sep 27 13:42:12 BST 2008


Hi,

Am 27.09.2008 um 09:01 schrieb rajesh britto:

>  After export the variables i still get the same error.. in my job  
> output file.. when i check for the error file the content is empty...
>
> [sgeadmin at slaserver ~]$ cat cpi.o30
>
> Got 2 slots.
>
> Cannot read /tmp/30.1.all.q/machines.
>
seems that the file "machines" isn't written or not readable. You can  
investigate this by putting a "sleep 300" or so in the jobscript  
before the "mpirun...". While the job is sleeping, you can go to the  
node and check the content of the directory in $TMPDIR of the job,  
which just starts with the jobnumber on the node in /tmp.
> Looked for files with extension LINUX in
> directory /usr/local/mpich-1.2.6/util/machines .
>
-- Reuti

> ---RB
>
>
> On Fri, Sep 26, 2008 at 3:53 PM, Reuti <reuti at staff.uni-marburg.de>  
> wrote:
> Am 26.09.2008 um 11:55 schrieb rajesh britto:
>
>     thanks chris and reuti for your information. now i am able to  
> run an mpi program.. my mpi job has been successfully submitted but  
> when i check my job output it shows the following error.
>
> [sgeadmin at slaserver ~]$ cat mpi.o126
> Warning: no access to tty (Bad file descriptor).
> Thus no job control in this shell.
>
> This is the output from the c shell. If you prefer the shell  
> mentioned in the jobscript, the queue definition has to be changed  
> to read "shell_start_mode      unix_behavior".
>
> http://gridengine.sunsource.net/howto/commonproblems.html
>
> Got 2 slots.
> Cannot read /usr/local/mpich-1.2.6/util/machines/machine.LINUX.
> Looked for files with extension LINUX in
> directory /usr/local/mpich-1.2.6/util/machines .
>
> #!/bin/sh
> echo "Got $NSLOTS slots."
> export MPICH_PROCESS_GROUP=no
> export P4_RSHCOMMAND=rsh
> mpirun -np $NSLOTS -machinefile $TMPDIR/machines ...
>
> -- Reuti
>
>
> regards,
>
> RB
>
> On Fri, Sep 26, 2008 at 3:15 PM, Reuti <reuti at staff.uni-marburg.de>  
> wrote:
> Hi,
>
> Am 26.09.2008 um 07:03 schrieb rajesh britto:
>
>
> Hi,
>
>  Thanks for the information.. i implemented mpich implementation..  
> can anyone give me how to submit mpi job using mpich and sge..
>
> what do you mean in detail: you setup a parallel environment  
> already or just a plain mpich(1) installation? As Chris pointed  
> out, there are documents in $SGE_ROOT/mpi/ to get started.
>
> Additional hints for a Tight Integration you will find here:
>
> http://gridengine.sunsource.net/howto/mpich-integration.html
>
> -- Reuti
>
>
>
> On Thu, Sep 25, 2008 at 7:59 PM, Chris Dagdigian <dag at sonsorol.org>  
> wrote:
> Hello,
>
> This is what I'd recommend:
>
> (1) Determine what sort of MPI environment you have or need to  
> install (there are many implementations of the MPI standard)
> (1.5) If you don't know what MPI to start with, start with OpenMPI  
> from openmpi.org as that works beautifully with SGE in tight  
> integration mode
> (2) Set up and install MPI
> (3) Compile the example cpi.c program using MPICC
> (4) Run your MPI job outside of Grid Engine using passwordless SSH  
> to the nodes
>
> The basic idea here is that integrating MPI with Grid Engine is  
> far, far easier if you are first able to validate for yourself that  
> MPI works on its own. I've seen many "SGE can't handle MPI" trouble  
> tickets where the actual problem was with the MPI install and not  
> Grid Engine
>
> Once you have MPI working outside of SGE and ideally with a real  
> world application then you can move on to SGE work ...
>
> Resources:
> (1) example scripts can be found in $SGE_ROOT/mpi/
> (2) The documentation on wikis.sun.com for Grid Engine covers PE  
> and parallel environment stuff well
> (3) A few google searches on "tight mpi integration with SGE" or  
> similar will show you other HOWTO methods
>
> If you use openmpi and compile it with the "--with-sge" option then  
> OpenMPI will automatically detect that it is running under Grid  
> Engine and will do the right thing. This is currently (in my  
> opinion) the fastest and easiest way to get a working tight MPI  
> integration into SGE at this time.
>
> For the difference between "tight" and "loose" PE integration and  
> why tight is better if you can achieve it, these links may help:
>
> http://gridengine.info/2005/09/19/parallel-environments-pes-loose- 
> vs-tight-integration
>
> -Chris
>
>
>
>
> On Sep 25, 2008, at 9:18 AM, rajesh britto wrote:
>
> hi,
>  i have installed sge6.2 in a cluster and it works fine for  
> sequential job.. when i submit an mpi job the job gets submitted  
> and it goes to the waiting state.. (qw state)
>  i need to know how to set up parallel environment to run a mpi  
> job, if any one has any document it will be usefull..
>  thanks in advance..
>
> with regards,
> RB
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list