[GE users] MPI problems persist

reuti reuti at staff.uni-marburg.de
Mon Nov 15 09:53:27 GMT 2010


Am 15.11.2010 um 07:04 schrieb heine:

> Reuti,
> 
> No, the firewall does not restrict traffic on the 'private' network, and I do not have SELinux enabled. I already mentioned that I can run mpiexec exec with any program successfully with -np (x) and --hostfile (xxxx). It just does not work when I submit the jib to Grid Engine. I can also run it successfully using a test configuration using Torque? 

I got a little bit lost in this issue due to my absence last week. You compiled Open MPI with "--with-sge", compiled the application with these wrappers and use the correct `mpiexec` in your jobscript?

So a default tight integration should work. Otherwise the default `ssh` set by Open MPI will be used.

-- Reuti


> Thank you
> Heine
> 
> On Sun, 2010-11-14 at 17:21 +0200, reuti wrote:
>> Am 10.11.2010 um 07:09 schrieb heine:
>> 
>> > <snip>
>> > The statement is 'This may be because the daemon was unable to find all the needed shared libraries on the remote node.' And I guess it could have been, but for something as simple as the hostname command to fail, seems to point to something else?
>> 
>> Possibly. Did you move your Open MPI installation after you compiled it to a different location?
>> 
>> Do you have any firewall on the machines or SELinux enabled?
>> 
>> -- Reuti
>> 
>> 
>> >>> location of the shared libraries on the remote nodes and this will
>> >>> automatically be forwarded to the remote nodes.
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> mpirun noticed that the job aborted, but has no info as to the process
>> >>> that caused that situation.
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>> >>> below. Additional manual cleanup may be required - please refer to
>> >>> the "orte-clean" tool for assistance.
>> >>> --------------------------------------------------------------------------
>> >>>         comp019 - daemon did not report back when launched
>> >>>         comp017 - daemon did not report back when launched
>> >>> 
>> >>> Thanks
>> >>> Heine
>> >>> 
>> >>> 
>> > 
>> > 
>> > -- 
>> > 
>> > Heine de Jager  * Stelsel Administrateur * Universiteit Stellenbosch * Tel: 021 808 4989
>> 
>> ------------------------------------------------------
>> 
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=295633
>> 
>> 
>> To unsubscribe from this discussion, e-mail: [
>> users-unsubscribe at gridengine.sunsource.net
>> ].
>> 
> 
> 
> -- 
> 
> Heine de Jager  * Stelsel Administrateur * Universiteit Stellenbosch * Tel: 021 808 4989

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=295807

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list