[GE users] OpenMPI 1.2 integration and dedicated MPI networks

Reuti reuti at staff.uni-marburg.de
Fri Oct 20 19:52:58 BST 2006


Am 20.10.2006 um 17:45 schrieb Orion Poplawski:

> Reuti wrote:
>> Hi,
>> Am 20.10.2006 um 01:08 schrieb Orion Poplawski:
>>> I'm starting to test out OpenMPI 1.2 tight integration with SGE  
>>> and have run into the following issue.  Currently, my startmpi  
>>> script massages the hostnames in the machines file created from  
>>> the SGE pe_hostfile add an "x" suffix on machines that are  
>>> connected with a separate GigE network dedicated for MPI traffic.
>>>
>>> With tight integration, openmpi uses the SGE pe_hostfile  
>>> directly, e.g.:
>>>
>>> coop00.cora.nwra.com 2 coop.q at coop00.cora.nwra.com <NULL>
>>> coop01.cora.nwra.com 2 coop.q at coop01.cora.nwra.com <NULL>
>>>
>>> Now, how/can I modify this so that MPI traffic speaks to coop00x  
>>> and coop01x?  One immediate problem that I'm running into is that  
>>> the startmpi script from the SGE PE runs as the user of the job  
>>> so it can't modify pe_hostfile.
>> is the name of the pe_hostfile hardcoded, to point to the one in  
>> the nodes spool directory, or is OpenMPI using the $PE_HOSTFILE,  
>> which you could reset to a new name to point to a modified one?  
>> Another issue might be the back-channel of the communication,  
>> where sometimes simply the `hostname` of the sender is taken to  
>> answer.
>
> (Sending this to the openmpi-devel list as well I see what insight  
> they may have.  This seems like a common use case.)
>
> It uses $PE_HOSTFILE, so I made a startup script that created a new  
> pe_hostfile.  This requires something like the following in my job  
> script:
>
> setenv PE_HOSTFILE $TMPDIR/pe_hostfile
> orterun -np $NSLOTS $*
>
> which is unfortunate that it can't be handled automatically somehow.
>
> First tried:
>
> coop01x.cora.nwra.com 2 coop.q at coop01.cora.nwra.com <NULL>
> coop00x.cora.nwra.com 2 coop.q at coop00.cora.nwra.com <NULL>
>
> Which yielded:
>
> error: commlib error: access denied (client IP resolved to host  
> name "coop01x.cora.nwra.com". This is not identical to clients host  
> name "coop01.cora.nwra.com")

Can you try to use a host_aliases file for these machines? Routing  
also the SGE traffic to the secondary network shouldn't be much of an  
impact. This way the pe_hostfile should be filled already with the  
correct machine names for the second network. As mentioned: only the  
communication back may still be a problem, if just `hostname` is used  
in OpenMPI.

-- Reuti


> error: executing task of job 41354 failed: failed sending task to  
> execd at coop00x.cora.nwra.com: can't find connection
> [coop01:27468] ERROR: A daemon on node coop00x.cora.nwra.com failed  
> to start as expected.
> [coop01:27468] ERROR: There may be more information available from
> [coop01:27468] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> [coop01:27468] ERROR: If the problem persists, please restart the
> [coop01:27468] ERROR: Grid Engine PE job
> [coop01:27468] ERROR: The daemon exited unexpectedly with status 1.
> error: commlib error: access denied (client IP resolved to host  
> name "coop01x.cora.nwra.com". This is not identical to clients host  
> name "coop01.cora.nwra.com")
> error: executing task of job 41354 failed: failed sending task to  
> execd at coop01x.cora.nwra.com: can't find connection
>
> Then:
>
> coop01x.cora.nwra.com 2 coop.q at coop01x.cora.nwra.com <NULL>
> coop00x.cora.nwra.com 2 coop.q at coop00x.cora.nwra.com <NULL>
>
> which yields:
>
> error: commlib error: access denied (client IP resolved to host  
> name "coop01x.cora.nwra.com". This is not identical to clients host  
> name "coop01.cora.nwra.com")
> error: executing task of job 41356 failed: failed sending task to  
> execd at coop01x.cora.nwra.com: can't find connection
> error: commlib error: access denied (client IP resolved to host  
> name "coop01x.cora.nwra.com". This is not identical to clients host  
> name "coop01.cora.nwra.com")
> [coop01:27945] ERROR: A daemon on node coop01x.cora.nwra.com failed  
> to start as expected.
> [coop01:27945] ERROR: There may be more information available from
> [coop01:27945] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> [coop01:27945] ERROR: If the problem persists, please restart the
> [coop01:27945] ERROR: Grid Engine PE job
> [coop01:27945] ERROR: The daemon exited unexpectedly with status 1.
> error: executing task of job 41356 failed: failed sending task to  
> execd at coop00x.cora.nwra.com: can't find connection
>
>
> Now, looking at the OpenMPI gridengine code, it looks like it gets  
> the node name from the first entry in the pe_hostfile, and never  
> really uses the queue name for anything.
>
>         ptr = strtok_r(buf, " \n", &tok);
>         num = strtok_r(NULL, " \n", &tok);
>         queue = strtok_r(NULL, " \n", &tok);
>         arch = strtok_r(NULL, " \n", &tok);
> ...
>         node->node_name = strdup(ptr);
>         node->node_arch = strdup(arch);
>
> Perhaps it can be modified it uses the queue name hostname when  
> doing SGE/qrsh calls, but the first hostname when doing MPI  
> communication. Not really sure what the intent of the two fields in  
> SGE's pe_hostfile is, or if OpenMPI can handle the idea of two  
> hostnames for different purposes.
>
> -- 
> Orion Poplawski
> System Administrator                  303-415-9701 x222
> NWRA/CoRA Division                    FAX: 303-415-9702
> 3380 Mitchell Lane                  orion at cora.nwra.com
> Boulder, CO 80301              http://www.cora.nwra.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list