[GE users] OpenMPI 1.2 integration and dedicated MPI networks

Orion Poplawski orion at cora.nwra.com
Fri Oct 20 16:45:25 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
> Hi,
> 
> Am 20.10.2006 um 01:08 schrieb Orion Poplawski:
> 
>> I'm starting to test out OpenMPI 1.2 tight integration with SGE and 
>> have run into the following issue.  Currently, my startmpi script 
>> massages the hostnames in the machines file created from the SGE 
>> pe_hostfile add an "x" suffix on machines that are connected with a 
>> separate GigE network dedicated for MPI traffic.
>>
>> With tight integration, openmpi uses the SGE pe_hostfile directly, e.g.:
>>
>> coop00.cora.nwra.com 2 coop.q at coop00.cora.nwra.com <NULL>
>> coop01.cora.nwra.com 2 coop.q at coop01.cora.nwra.com <NULL>
>>
>> Now, how/can I modify this so that MPI traffic speaks to coop00x and 
>> coop01x?  One immediate problem that I'm running into is that the 
>> startmpi script from the SGE PE runs as the user of the job so it 
>> can't modify pe_hostfile.
> 
> is the name of the pe_hostfile hardcoded, to point to the one in the 
> nodes spool directory, or is OpenMPI using the $PE_HOSTFILE, which you 
> could reset to a new name to point to a modified one? Another issue 
> might be the back-channel of the communication, where sometimes simply 
> the `hostname` of the sender is taken to answer.

(Sending this to the openmpi-devel list as well I see what insight they 
may have.  This seems like a common use case.)

It uses $PE_HOSTFILE, so I made a startup script that created a new 
pe_hostfile.  This requires something like the following in my job script:

setenv PE_HOSTFILE $TMPDIR/pe_hostfile
orterun -np $NSLOTS $*

which is unfortunate that it can't be handled automatically somehow.

First tried:

coop01x.cora.nwra.com 2 coop.q at coop01.cora.nwra.com <NULL>
coop00x.cora.nwra.com 2 coop.q at coop00.cora.nwra.com <NULL>

Which yielded:

error: commlib error: access denied (client IP resolved to host name 
"coop01x.cora.nwra.com". This is not identical to clients host name 
"coop01.cora.nwra.com")
error: executing task of job 41354 failed: failed sending task to 
execd at coop00x.cora.nwra.com: can't find connection
[coop01:27468] ERROR: A daemon on node coop00x.cora.nwra.com failed to 
start as expected.
[coop01:27468] ERROR: There may be more information available from
[coop01:27468] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[coop01:27468] ERROR: If the problem persists, please restart the
[coop01:27468] ERROR: Grid Engine PE job
[coop01:27468] ERROR: The daemon exited unexpectedly with status 1.
error: commlib error: access denied (client IP resolved to host name 
"coop01x.cora.nwra.com". This is not identical to clients host name 
"coop01.cora.nwra.com")
error: executing task of job 41354 failed: failed sending task to 
execd at coop01x.cora.nwra.com: can't find connection

Then:

coop01x.cora.nwra.com 2 coop.q at coop01x.cora.nwra.com <NULL>
coop00x.cora.nwra.com 2 coop.q at coop00x.cora.nwra.com <NULL>

which yields:

error: commlib error: access denied (client IP resolved to host name 
"coop01x.cora.nwra.com". This is not identical to clients host name 
"coop01.cora.nwra.com")
error: executing task of job 41356 failed: failed sending task to 
execd at coop01x.cora.nwra.com: can't find connection
error: commlib error: access denied (client IP resolved to host name 
"coop01x.cora.nwra.com". This is not identical to clients host name 
"coop01.cora.nwra.com")
[coop01:27945] ERROR: A daemon on node coop01x.cora.nwra.com failed to 
start as expected.
[coop01:27945] ERROR: There may be more information available from
[coop01:27945] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[coop01:27945] ERROR: If the problem persists, please restart the
[coop01:27945] ERROR: Grid Engine PE job
[coop01:27945] ERROR: The daemon exited unexpectedly with status 1.
error: executing task of job 41356 failed: failed sending task to 
execd at coop00x.cora.nwra.com: can't find connection


Now, looking at the OpenMPI gridengine code, it looks like it gets the 
node name from the first entry in the pe_hostfile, and never really uses 
the queue name for anything.

         ptr = strtok_r(buf, " \n", &tok);
         num = strtok_r(NULL, " \n", &tok);
         queue = strtok_r(NULL, " \n", &tok);
         arch = strtok_r(NULL, " \n", &tok);
...
         node->node_name = strdup(ptr);
         node->node_arch = strdup(arch);

Perhaps it can be modified it uses the queue name hostname when doing 
SGE/qrsh calls, but the first hostname when doing MPI communication. 
Not really sure what the intent of the two fields in SGE's pe_hostfile 
is, or if OpenMPI can handle the idea of two hostnames for different 
purposes.

-- 
Orion Poplawski
System Administrator                  303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list