[GE users] Problems integrating MPICH2 and SGE

Reuti reuti at staff.uni-marburg.de
Thu Apr 17 17:35:28 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Am 17.04.2008 um 18:27 schrieb Instituto de Ingenieria Área de  
Sistemas Unix/Linux:

> Well, the problem is still there.
>
> I manually removed the 1.04 directory, and installed 1.07 in its place
> - also replicating it in the nodes. No luck. Same error.

If there is still a smpd version mismatch, then maybe you have to set  
something in your .bashrc to get the correct $PATH for the non- 
interactive qrsh to the slave nodes.

> So I went *back* to 1.06. This time the error was different:

1.0.6 is broken unless you fix it:

http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=23233

-- Reuti


> Aborting: unable to connect to tonatiuh.iingen.unam.mx
>
> I figured, for a quick fix until I round the root of the problem, I
> just comment out the frontend entry on the machines file. No luck:
>
> op_connect error: socket connection failed, error stack:
> MPIDU_Socki_handle_connect(791): connection failure
> (set=1,sock=16777216,errno=111:(strerror() not found))
> unable to connect mpiexec tree, socket connection failed, error stack:
> MPIDU_Socki_handle_connect(791): connection failure
> (set=1,sock=16777216,errno=111:(strerror() not found)).
>
> I'm clueless here.
>
> 2008/4/16, Reuti <reuti at staff.uni-marburg.de>:
>> Hi,
>>
>>  Am 16.04.2008 um 21:53 schrieb Instituto de Ingenieria Área de  
>> Sistemas
>> Unix/Linux:
>>
>>
>>> I'm trying to install MPICH2 with tight integration with the SGE. I
>>> have followed Reuti's manual
>>>
>> (http://gridengine.sunsource.net/howto/mpich2-integration/mpich2- 
>> integration.html)
>>> but I can't get it to work. I'm using SGE 6.1 and MPICH2 1.04; the
>>> error appears in my .o# file and is the following:
>>>
>>> Warning: no access to tty (Bad file descriptor).
>>> Thus no job control in this shell.
>>>
>>
>>  http://gridengine.sunsource.net/howto/commonproblems.html
>>
>>  Best in a Linux cluster is often to define in the queue:
>>
>>  shell                 /bin/sh
>>  shell_start_mode      unix_behavior
>>
>>  to get the usual startup.
>>
>>
>>> Aborting: unable to connect to tonatiuh.iingen.unam.mx, smpd version
>> mismatch
>>>
>>
>>  The error message is clear IMO: "smpd version mismatch"
>>
>>  As would go for the latest 1.0.7, I would suggest to install it  
>> somewhere
>> in your ~/local/mpich2-1.0.7 or alike, so it's for sure the same  
>> version on
>> all nodes (by setting a proper PATH to this version of the  
>> binaries in
>> ~/local/mpich2-1.0.7/bin). The 1.0.4 was installed with Rocks?
>>
>>  Warning: there are at least 4 possibilities to startup (i.e.  
>> compile)
>> MPICH2. The one you chose a) must be used to compile your  
>> application, b)
>> use exactly this mpiexec from this MPICH2 installation to run it,  
>> c) use the
>> appropriate PE in SGE. It's not possible to change the MPICH2  
>> startup by
>> simply using a different mpiexcec or PE.
>>
>>  -- Reuti
>>
>>
>>
>>> The output of the .po# file:
>>>
>>> -catch_rsh
>> /opt/gridengine/default/spool/compute-0-3/active_jobs/2635.1/ 
>> pe_hostfile
>>> compute-0-3
>>> compute-0-1
>>> compute-0-2
>>> tonatiuh
>>>
>>> both error files appear empty, and the jobs run for a split  
>>> second in
>>> the cluster only. Any ideas on what's going on?
>>>
>>> Thanks,
>>>
>>> Sergio.
>>>
>>> --
>>> Instituto de Ingeniería de la UNAM
>>> Coordinación de Sistemas de Cómputo
>>> Área de Sistemas Unix/Linux
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>>  To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>>  For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>>
>>
>
>
> -- 
> Instituto de Ingeniería de la UNAM
> Coordinación de Sistemas de Cómputo
> Área de Sistemas Unix/Linux
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list