[GE users] Re: mpich <-> sge --> controlling hosts machinefile

Reuti reuti at staff.uni-marburg.de
Thu Jul 5 13:03:02 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Okay, then adjust the startmpi.sh to append .rhrk.uni-kl.de to each  
line in the machinefile. It's defined in the PeHostfile2MachineFile()  
subroutine.

-- Reuti


Am 05.07.2007 um 13:32 schrieb Gerolf Ziegenhain:

> This was a different job; but the same problem.
>
> /BR: Gerolf
>
> 2007/7/5, Reuti <reuti at staff.uni-marburg.de>: Am 05.07.2007 um  
> 12:07 schrieb Gerolf Ziegenhain:
>
> > I was able to track it down even more to this:
> >
> > ssh lc12 cat /tmp/244224.1.q_mpich/machines
> > lc12
> > lc12
> > lc19
> > lc19
> > lc14
> > lc14
> > lc13
> > lc13
> >
> > The master-node of the 8-processore-job has a good looking
> > machinefile.
> >
> > The current running process creates from this the following
> > machinefile:
> > lc12.rhrk.uni-kl.de 0 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> > lc19 1 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> > lc19 1 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> > lc14 1 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> > lc14 1 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> > lc13 1 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> > lc13 1 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> > lc19 1 /gu2/ziegen/LINUX/data/bin//lmp_rhrk_parallel
> >
> > Where is the machinefile created? Is this done by the
> > $SGE_ROOT/mpi/start.mpi.sh -catch_rsh $pe_hostfile
> > ?
>
> Was this the same job? Once I saw three processes on node12 in your
> output, and you stated node19. The MPICH will remove one line from
> the machinefile, which corresponds to the reply of `hostname`. If
> `hostname` gives something else than node12, this must be adjusted in
> the startmpi.sh proceduere, so that one line can be indeed removed.
> Otherwise the machinefile will not be scanned completely or be
> scanned more than once.
>
> Some details you may find here:
>
> http://gridengine.sunsource.net/howto/mpich-integration.html
>
> -- Reuti
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>
> -- 
> Dipl. Phys. Gerolf Ziegenhain
> Office: Room 46-332 - Erwin-Schrödinger-Str.46 - TU Kaiserslautern  
> - Germany
> Web: gerolf.ziegenhain.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list