[GE users] Rmpi under SGE

reuti reuti at staff.uni-marburg.de
Fri Dec 17 10:28:24 GMT 2010


Hi,

Am 17.12.2010 um 11:04 schrieb arnuschky:

> we're having massive problems using Rmpi with OpenMPI under SGE. OpenMPI
> is tested and works fine. We're submittig one master Rscript, which is
> in turn spawning the required slaves using Rmpi. Unfortunately, this
> fails:
> 
>        $ cat testRmpi.e3480556
>        Warning: Permanently added 'compute-1-13.local' (RSA) to the list of known hosts.
>        Warning: Permanently added 'compute-1-10.local' (RSA) to the list of known hosts.
>        Warning: Permanently added 'compute-1-11.local' (RSA) to the list of known hosts.
>        Warning: Permanently added 'compute-1-12.local' (RSA) to the list of known hosts.
>        Warning: Permanently added 'compute-1-14.local' (RSA) to the list of known hosts.
>        Permission denied, please try again.

when Open MPI has a tight integration into SGE, I would assume SGE is configured to use "ssh". What is the output of `qconf -sconf`, there might be double entries?

http://marc.info/?l=npaci-rocks-discussion&m=126411729709528

If you want or must use ssh for sure, you need either passphraseless ssh keys (deprecated), or a hostbased authentication:

http://gridengine.sunsource.net/howto/hostbased-ssh.html

-- Reuti


>        Permission denied, please try again.
>        Permission denied (publickey,gssapi-with-mic,password).
>        Permission denied, please try again.
>        Permission denied, please try again.
>        Permission denied (publickey,gssapi-with-mic,password).
>        Permission denied, please try again.
>        Permission denied, please try again.
>        Permission denied, please try again.
>        Permission denied (publickey,gssapi-with-mic,password).
>        Permission denied, please try again.
>        Permission denied (publickey,gssapi-with-mic,password).
>        Permission denied, please try again.
>        Permission denied, please try again.
>        Permission denied (publickey,gssapi-with-mic,password).
>        --------------------------------------------------------------------------
>        A daemon (pid 26953) died unexpectedly with status 129 while attempting
>        to launch so we are aborting.
> 
>        There may be more information reported by the environment (see above).
> 
>        This may be because the daemon was unable to find all the needed shared
>        libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>        location of the shared libraries on the remote nodes and this will
>        automatically be forwarded to the remote nodes.
>        --------------------------------------------------------------------------
>        mpirun: clean termination accomplished
> 
> We're using openmpi-1.3.3 (--with-sge) and SGE V62u4.
> 
> Any hint's on what's going wrong here?
> 
> Cheers,
> Arne
> 
> -- 
> Arne Brutschy
> Ph.D. Student                    Email    arne.brutschy(AT)ulb.ac.be
> IRIDIA CP 194/6                  Web      iridia.ulb.ac.be/~abrutschy
> Universite' Libre de Bruxelles   Tel      +32 2 650 2273
> Avenue Franklin Roosevelt 50     Fax      +32 2 650 2715
> 1050 Bruxelles, Belgium          (Fax at IRIDIA secretary)
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=306389
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=306398

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list