[GE users] mpi problems

Craig Tierney Craig.Tierney at noaa.gov
Wed Apr 30 17:55:07 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
> Hi,
> 
> Am 30.04.2008 um 17:53 schrieb Roberta Gigon:
> 
>> Very strange happenings here indeed.
>>
>> I made the changes you suggested and now the job will run if I set -pe 
>> mpi 2 and -np 2, but fails on any more than 2 nodes.  If I run the 
>> same job independent of SGE, it still runs fine regardless of the -np 
>> set.
>>
>> We have the mpd master running all the time on the cluster head node 
>> and the mpd slaves running all the time on the nodes in mpi.q.
> 
> I could assume, as the "initial" mpd daemon is running on the head node 
> of the cluster, that the slave daemons simply don't know anything about 
> how to contact the other slaves. The master-task of your parallel job 
> might need to run always on the headnode, and instruct the mpd daemon on 
> this node where to start slave processes.
> 
> You could try this, but requesting one master and n slave nodes is not 
> available in SGE for now.

If the head node is node01 at mpi.q, you can submit your job as:

qsub -masterq node01 at mpi.q -pe $PENAME $NP $SGESCRIPT

To get what the user wants.

However, this isn't very flexible.  It means that all jobs have the
same head node.  For mvapich2, we have replaced the mpirun with a script
that looks like:

- Get nodes from $PE_HOSTLIST or some other file
- remove head node (the local node) from this list
- Start mpd in background, grab port
- Call pdsh to execute mpd with proper port on all other nodes
- Call mpiexec with command line options passed to mpirun
- call mpdallexit

Users don't have to know about the ring, and you are not constrained
to which node is the head node.

Craig




> 
> -- Reuti
> 
> 
>> Puzzled,
>> Roberta
>>
>> --------------------------------------------------------------------------------------------- 
>>
>> Roberta M. Gigon
>> Schlumberger-Doll Research
>> One Hampshire Street, MD-B253
>> Cambridge, MA 02139
>> 617.768.2099 - phone
>> 617.768.2381 - fax
>>
>> This message is considered Schlumberger CONFIDENTIAL.  Please treat 
>> the information contained herein accordingly.
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Monday, April 28, 2008 6:17 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] mpi problems
>>
>> Hi Roberta,
>>
>> Am 28.04.2008 um 19:16 schrieb Roberta Gigon:
>>
>>> I have also tried using -machinefile $TMPDIR/machines in the script
>>> file and get the same result.
>>
>> for the mpd-method you would need to modify the
>> PeHostfile2MachineFile subroutine in startmpi.sh:
>>
>>     cat $1 | while read line; do
>>        # echo $line
>>        host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
>>        nslots=`echo $line|cut -f2 -d" "`
>>        echo $host:$nslots
>>     done
>>
>>> We have been using the mpd method.
>>
>> There is no tight integration for the mpd-method (therefore it's not
>> in the Howto), as the daemons would always start without being
>> controlled by SGE. But even if you would like only a loose
>> integration: how is this working with PBS: you are starting a ring of
>> mpds per job according to the MPICH2 manual page18? (How it this
>> working with two different jobs on one node? mpdallexit would also
>> shutdown the daemons of the other job as it has no args AFAICS.)
>>
>>> The master is on the head node of the cluster and the nodes are all
>>> mpd "slaves".
>>
>> Or do you have the ring simply always active across the complete
>> cluster?
>>
>>>   I didn't see instructions for the mpd method in the how-to.  The
>>> program we are using with MPICH-2 is MCNP from Los Alamos National
>>> Labs; I'm not sure if it works with any of the other methods.
>>
>> The MCNP is not publicy available. Do you have the source and could
>> compile it just with another mpicc or so?
>>
>> -- Reuti
>>
>>
>>>
>>> Thanks!
>>> Roberta
>>>
>>> P.S.  Regarding the bear72/bear75 confusion... I cut and pasted the
>>> wrong error file entry... in reality, it is consistent.
>>>
>>> ----------------------------------------------------------------------
>>> -----------------------
>>> Roberta M. Gigon
>>> Schlumberger-Doll Research
>>> One Hampshire Street, MD-B253
>>> Cambridge, MA 02139
>>> 617.768.2099 - phone
>>> 617.768.2381 - fax
>>>
>>> This message is considered Schlumberger CONFIDENTIAL.  Please treat
>>> the information contained herein accordingly.
>>>
>>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: Monday, April 28, 2008 12:07 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] mpi problems
>>>
>>> Hi,
>>>
>>> Am 28.04.2008 um 17:23 schrieb Roberta Gigon:
>>>
>>>> I'm having a few issues with getting MPICH-2  to work under SGE. I
>>>> have an mpi job that works just fine with PBS and outside of SGE,
>>>> so I'm pretty confident in saying that MPI itself is working.
>>>
>>> the included $SGE_ROOT/mpi is only for MPICH(1). There is a Howto for
>>> MPICH2:
>>>
>>> http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-
>>> integration.html
>>>
>>> Just take note, that MPICH2 can be compiled in at least 4 different
>>> ways and the compilation (of your application) must use the
>>> appropriate mpirun and SGE PE. Which type of startup do you want to
>>> use?
>>>
>>> Anyway: you have no -machinefile or similar in your mpirun call,
>>> hence all will be local. And: how it's getting from bear72 to bear75
>>> - do you have any predefined mpd.hosts which could trigger this?
>>>
>>> -- Reuti
>>>
>>> PS: Please try the latest 1.0.7 of MPICH2 (although your 1.0.4p1
>>> should be fine), at least 1.0.6p1 is broken.
>>>
>>>
>>>>
>>>> Some background:
>>>> I have a pe called mpi with these characteristics:
>>>>
>>>> [root at bear ~]$ qconf -sp mpi
>>>> pe_name           mpi
>>>> slots             999
>>>> user_lists        NONE
>>>> xuser_lists       NONE
>>>> start_proc_args   /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
>>>> stop_proc_args    /opt/sge/mpi/stopmpi.sh
>>>> allocation_rule   $round_robin
>>>> control_slaves    FALSE
>>>> job_is_first_task TRUE
>>>> urgency_slots     min
>>>>
>>>> I have a queue called mpi.q with 6 dual processor nodes (12 slots)
>>>>
>>>> I submit the job like this:  qsub -q mpi.q -pe mpi 6 -cwd ./
>>>> sbt034.csh
>>>>
>>>> sbt034.csh:
>>>> #! /bin/tcsh
>>>>
>>>> #$ -q mpi.q
>>>> #$ -j y
>>>> #$ -o testSGE2.out
>>>> #$ -N testSGE2
>>>> #$ -cwd
>>>> #$ -pe mpi 6
>>>>
>>>> echo running...
>>>> echo $TMPDIR
>>>> /usr/local/mpich2-1.0.4p1-pgi-k8-64/bin/mpirun -np 6 /people8/tzhou/
>>>> mcnprun/SUN/bin/mcnp 5j.mpi i=sbt034 wwinp=sbwwmx05 eol
>>>> echo done!
>>>>
>>>> qstat says:
>>>>
>>>> tzhou at bear[162] qstat
>>>> job-ID  prior   name       user         state submit/start at
>>>> queue                          slots ja-task-ID
>>>> ---------------------------------------------------------------------
>>>> -
>>>> -------------------------------------------
>>>>    6862 0.56000 testSGE2   tzhou        r     04/28/2008 10:52:48
>>>> mpi.q at bear72.cl.slb.com            6
>>>>
>>>> error file says:
>>>> master starting       5 tasks with       1 threads each  **/**/08
>>>> **:**:10
>>>>  master sending static commons...
>>>>  master sending dynamic commons...
>>>>  master sending cross section data...
>>>> PGFIO/stdio: No such file or directory
>>>> PGFIO-F-/OPEN/unit=32/error code returned by host stdio - 2.
>>>>  In source file msgtsk.f90, at line number 116
>>>> PGFIO/stdio: No such file or directory
>>>> PGFIO-F-/OPEN/unit=32/error code returned by host stdio - 2.
>>>>  In source file msgtsk.f90, at line number 116
>>>> rank 4 in job 4  bear75.cl.slb.com_47485   caused collective abort
>>>> of all ranks
>>>>   exit status of rank 4: killed by signal 9
>>>> done!
>>>>
>>>> The $TMPDIR gets set properly...
>>>>
>>>> Any thoughts on what might be happening here?
>>>>
>>>> Many thanks,
>>>> Roberta
>>>>
>>>> ---------------------------------------------------------------------
>>>> -
>>>> -----------------------
>>>> Roberta M. Gigon
>>>> Schlumberger-Doll Research
>>>> One Hampshire Street, MD-B253
>>>> Cambridge, MA 02139
>>>> 617.768.2099 - phone
>>>> 617.768.2381 - fax
>>>>
>>>> This message is considered Schlumberger CONFIDENTIAL.  Please treat
>>>> the information contained herein accordingly.
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 


-- 
Craig Tierney (craig.tierney at noaa.gov)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list