[GE users] OpenMPI 1.2 integration and dedicated MPI networks

Reuti reuti at staff.uni-marburg.de
Fri Oct 20 07:59:20 BST 2006


Am 20.10.2006 um 01:08 schrieb Orion Poplawski:

> I'm starting to test out OpenMPI 1.2 tight integration with SGE and  
> have run into the following issue.  Currently, my startmpi script  
> massages the hostnames in the machines file created from the SGE  
> pe_hostfile add an "x" suffix on machines that are connected with a  
> separate GigE network dedicated for MPI traffic.
> With tight integration, openmpi uses the SGE pe_hostfile directly,  
> e.g.:
> coop00.cora.nwra.com 2 coop.q at coop00.cora.nwra.com <NULL>
> coop01.cora.nwra.com 2 coop.q at coop01.cora.nwra.com <NULL>
> Now, how/can I modify this so that MPI traffic speaks to coop00x  
> and coop01x?  One immediate problem that I'm running into is that  
> the startmpi script from the SGE PE runs as the user of the job so  
> it can't modify pe_hostfile.

is the name of the pe_hostfile hardcoded, to point to the one in the  
nodes spool directory, or is OpenMPI using the $PE_HOSTFILE, which  
you could reset to a new name to point to a modified one? Another  
issue might be the back-channel of the communication, where sometimes  
simply the `hostname` of the sender is taken to answer.

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list