[GE users] Help: PE details

Chris Dagdigian dag at sonsorol.org
Sun Jul 13 15:18:23 BST 2008


There really is no need for pictures -- it's actually pretty simple.

A parallel program runs across more than one machine. In order for a  
parallel program to start, it must be fed a hostlist of systems that  
it is "allowed" to use. Outside of Grid Engine, people make these  
files manually or they are provided with one by an admin.

Outside of Grid Engine, parallel programs come with a starter program,  
usually called "mpirun" or similar. The specifics depend on the type  
of parallel environment you are using. Taking a generic MPI as an  
example:

  Outside of Grid Engine you would start your parallel program like  
this:

   #  mpirun -f ./my-host-file ./my-parallel-application


That's basically it.

The only difference when Grid Engine is involved ...

... is that in a loose integration environment, the SGE scheduler will  
pick and create a custom machine file made specifically for the job.  
The "mpirun" program runs exactly as you would run without Grid  
Engine. The only difference is that SGE is (a) picking which machines  
are to be used, (b) making a custom hostfile and (c) pending your job  
until things are ready to go.

When you step up to tight integration all you are letting grid engine  
do (in addition to the stuff you get with loose integration) is start  
up the parallel job directly (in a tight environment you let SGE start  
the parallel tasks rather than the user...this gives SGE more control  
over starting/stopping jobs and gets you more accurate accounting data).


Overall the best way to understand parallel programs is to forget  
about Grid Engine and just use your given parallel application directly.

Once you understand how all that stuff works it is a very simple  
process to understand how Grid Engine gets involved by either  
generating custom hostfiles or launching the tasks under the control  
of a sge_shepherd daemon.

-Chris




On Jul 13, 2008, at 9:04 AM, Lee Amy wrote:

>
>
> 2008/7/12 John Hearns <john.hearns at streamline-computing.com>:
> On Sat, 2008-07-12 at 21:37 +0800, Lee Amy wrote:
> > Hello,
> >
> > I want to learn the main idea and deep details about Parallel
> > Environment. Anyone can give me some hints?
> >
>
> Amy, the best way to learn about anything is by doing.
> In my case I was presented with an Opteron cluster and told "Install  
> Sun
> Gridengine on that".
> I used the README files and documentation on the Gridengine site and
> achieved a working installation.
> I then took the examples for the MPICH integration and used them to  
> set
> up a PE for MPICH. I would suggest you do the same - download mpich,
> install it on your cluster, and follow the Gridengine example to set  
> up
> MPICH integration. We will help you along the way.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> Yes, thank you very much. I have set up MPICH and Open MPI  
> integration PE. And I also understand how to use them, some simple  
> process such as start pe and make a hostlist to allocate, use qrsh  
> to start slave tasks then the stop script clean processes. Anyway, I  
> just know them a little indeed.
>
> In fact the internal mechanisms is what I want. Could you show me  
> some pictures to make the mechanisms of PE more clear?
>
> Thank you again.
>
> Regards,
>
> Amy Lee


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list