[GE users] Problems integrating MPICH2 and SGE

Reuti reuti at staff.uni-marburg.de
Wed Apr 16 23:05:35 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 16.04.2008 um 21:53 schrieb Instituto de Ingenieria Área de  
Sistemas Unix/Linux:

> I'm trying to install MPICH2 with tight integration with the SGE. I
> have followed Reuti's manual
> (http://gridengine.sunsource.net/howto/mpich2-integration/mpich2- 
> integration.html)
> but I can't get it to work. I'm using SGE 6.1 and MPICH2 1.04; the
> error appears in my .o# file and is the following:
>
> Warning: no access to tty (Bad file descriptor).
> Thus no job control in this shell.

http://gridengine.sunsource.net/howto/commonproblems.html

Best in a Linux cluster is often to define in the queue:

shell                 /bin/sh
shell_start_mode      unix_behavior

to get the usual startup.

> Aborting: unable to connect to tonatiuh.iingen.unam.mx, smpd  
> version mismatch

The error message is clear IMO: "smpd version mismatch"

As would go for the latest 1.0.7, I would suggest to install it  
somewhere in your ~/local/mpich2-1.0.7 or alike, so it's for sure the  
same version on all nodes (by setting a proper PATH to this version  
of the binaries in ~/local/mpich2-1.0.7/bin). The 1.0.4 was installed  
with Rocks?

Warning: there are at least 4 possibilities to startup (i.e. compile)  
MPICH2. The one you chose a) must be used to compile your  
application, b) use exactly this mpiexec from this MPICH2  
installation to run it, c) use the appropriate PE in SGE. It's not  
possible to change the MPICH2 startup by simply using a different  
mpiexcec or PE.

-- Reuti


> The output of the .po# file:
>
> -catch_rsh /opt/gridengine/default/spool/compute-0-3/active_jobs/ 
> 2635.1/pe_hostfile
> compute-0-3
> compute-0-1
> compute-0-2
> tonatiuh
>
> both error files appear empty, and the jobs run for a split second in
> the cluster only. Any ideas on what's going on?
>
> Thanks,
>
> Sergio.
>
> -- 
> Instituto de Ingeniería de la UNAM
> Coordinación de Sistemas de Cómputo
> Área de Sistemas Unix/Linux
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list