[GE users] Problems integrating MPICH2 and SGE

Instituto de Ingenieria Área de Sistemas Unix/Linux unix.iingen at gmail.com
Thu Apr 17 17:27:55 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Well, the problem is still there.

I manually removed the 1.04 directory, and installed 1.07 in its place
- also replicating it in the nodes. No luck. Same error.

So I went *back* to 1.06. This time the error was different:

Aborting: unable to connect to tonatiuh.iingen.unam.mx

I figured, for a quick fix until I round the root of the problem, I
just comment out the frontend entry on the machines file. No luck:

op_connect error: socket connection failed, error stack:
MPIDU_Socki_handle_connect(791): connection failure
(set=1,sock=16777216,errno=111:(strerror() not found))
unable to connect mpiexec tree, socket connection failed, error stack:
MPIDU_Socki_handle_connect(791): connection failure
(set=1,sock=16777216,errno=111:(strerror() not found)).

I'm clueless here.

2008/4/16, Reuti <reuti at staff.uni-marburg.de>:
> Hi,
>
>  Am 16.04.2008 um 21:53 schrieb Instituto de Ingenieria Área de Sistemas
> Unix/Linux:
>
>
> > I'm trying to install MPICH2 with tight integration with the SGE. I
> > have followed Reuti's manual
> >
> (http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html)
> > but I can't get it to work. I'm using SGE 6.1 and MPICH2 1.04; the
> > error appears in my .o# file and is the following:
> >
> > Warning: no access to tty (Bad file descriptor).
> > Thus no job control in this shell.
> >
>
>  http://gridengine.sunsource.net/howto/commonproblems.html
>
>  Best in a Linux cluster is often to define in the queue:
>
>  shell                 /bin/sh
>  shell_start_mode      unix_behavior
>
>  to get the usual startup.
>
>
> > Aborting: unable to connect to tonatiuh.iingen.unam.mx, smpd version
> mismatch
> >
>
>  The error message is clear IMO: "smpd version mismatch"
>
>  As would go for the latest 1.0.7, I would suggest to install it somewhere
> in your ~/local/mpich2-1.0.7 or alike, so it's for sure the same version on
> all nodes (by setting a proper PATH to this version of the binaries in
> ~/local/mpich2-1.0.7/bin). The 1.0.4 was installed with Rocks?
>
>  Warning: there are at least 4 possibilities to startup (i.e. compile)
> MPICH2. The one you chose a) must be used to compile your application, b)
> use exactly this mpiexec from this MPICH2 installation to run it, c) use the
> appropriate PE in SGE. It's not possible to change the MPICH2 startup by
> simply using a different mpiexcec or PE.
>
>  -- Reuti
>
>
>
> > The output of the .po# file:
> >
> > -catch_rsh
> /opt/gridengine/default/spool/compute-0-3/active_jobs/2635.1/pe_hostfile
> > compute-0-3
> > compute-0-1
> > compute-0-2
> > tonatiuh
> >
> > both error files appear empty, and the jobs run for a split second in
> > the cluster only. Any ideas on what's going on?
> >
> > Thanks,
> >
> > Sergio.
> >
> > --
> > Instituto de Ingeniería de la UNAM
> > Coordinación de Sistemas de Cómputo
> > Área de Sistemas Unix/Linux
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >
> >
>
>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
>  For additional commands, e-mail:
> users-help at gridengine.sunsource.net
>
>


-- 
Instituto de Ingeniería de la UNAM
Coordinación de Sistemas de Cómputo
Área de Sistemas Unix/Linux

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list