[GE users] tight integration with mvapich and openmpi

Reuti reuti at staff.uni-marburg.de
Tue Oct 21 19:16:37 BST 2008


Am 21.10.2008 um 16:49 schrieb Mike Hanby:

> by default, slots == cores, however you can override this and define
> however many slots you want for a node(s).
>
> -----Original Message-----
> From: Joseph Hargitai [mailto:joseph.hargitai at nyu.edu]
> Sent: Tuesday, October 21, 2008 9:33 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] tight integration with mvapich and openmpi
>
>
> what is slot referring to in this case? cores or nodes?  i can run on
> two nodes with 8 cores each with this pe. According to what you say, i
> should not be. Maybe i am misunderstanding something.

It refers to the maximum number of slots, which can be used inside  
this PE. Even if you attach this PE to three or more nodes with 8  
slots each, you are limited to 16 slots in total in this PE. Whether  
it's one job with 16-slots or 2 jobs with 8-slots doesn't matter.

> (we are aware of the patch in the tight integration tar file - our
> mvapich however is 1x not 0.9 so do not know how applicable it is.)
>
> b, we are not testing over ethernet.

But AFAIK for the startup of the InfiniBand you still need an rsh/ssh  
connection. This will be handled by SGE with its own qrsh:

1) - any parallel programm can call rsh
2) - this will be caught by SGE, when the PE has the -catch_rsh  
option set
3) - this will then call qrsh instead of a plain rsh
4) - as a random port is used for the rsh started by SGE's qrsh,  
there shouldn't be any firewalls between the machines

When the application was compiled to use ssh, you will need either to  
force it to use rsh instead, or adjust the startmpi.sh to create link  
called "ssh" to get access to SGE's rsh-wrapper. You can setup SGE to  
use ssh instead of rsh, but this is an additional step.

Be aware, that the called program in 1) is just a name. You can  
compile your application to call "foo" to startup internode  
communication, and create a wrapper for "foo". Whether the qrsh in  
SGE will then use rsh or ssh, is a setup in SGE's configuration.

As this should be checked first, maybe it's easier to have a look  
into the parallel startup inside SGE, before implementing a Tight  
Integration using InfiniBand which is even more complex than plain  
MPICH(1) over Ethernet.

-- Reuti


> j
>
> ----- Original Message -----
> From: Reuti <reuti at staff.uni-marburg.de>
> Date: Tuesday, October 21, 2008 8:46 am
> Subject: Re: [GE users] tight integration with mvapich and openmpi
>
>>> pe_name           mvapich-8
>>> slots             16
>>
>> With allocation rule 8 this means a max. of 2 jobs. Is this intented?
>>
>>> user_lists        NONE
>>> xuser_lists       NONE
>>> start_proc_args   /opt/gridengine/mpi/startmpi.sh -catch_rsh
>>> $pe_hostfile
>>> stop_proc_args    /opt/gridengine/mpi/stopmpi.sh
>>> allocation_rule   8
>>> control_slaves    TRUE
>>> job_is_first_task FALSE
>>> urgency_slots     min
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list