[GE users] mpi problems

Reuti reuti at staff.uni-marburg.de
Wed Apr 30 18:04:49 BST 2008


Am 30.04.2008 um 18:55 schrieb Craig Tierney:

> Reuti wrote:
>> Hi,
>> Am 30.04.2008 um 17:53 schrieb Roberta Gigon:
>>> Very strange happenings here indeed.
>>>
>>> I made the changes you suggested and now the job will run if I  
>>> set -pe mpi 2 and -np 2, but fails on any more than 2 nodes.  If  
>>> I run the same job independent of SGE, it still runs fine  
>>> regardless of the -np set.
>>>
>>> We have the mpd master running all the time on the cluster head  
>>> node and the mpd slaves running all the time on the nodes in mpi.q.
>> I could assume, as the "initial" mpd daemon is running on the head  
>> node of the cluster, that the slave daemons simply don't know  
>> anything about how to contact the other slaves. The master-task of  
>> your parallel job might need to run always on the headnode, and  
>> instruct the mpd daemon on this node where to start slave processes.
>> You could try this, but requesting one master and n slave nodes is  
>> not available in SGE for now.
>
> If the head node is node01 at mpi.q, you can submit your job as:
>
> qsub -masterq node01 at mpi.q -pe $PENAME $NP $SGESCRIPT
>
> To get what the user wants.
>
> However, this isn't very flexible.  It means that all jobs have the
> same head node.  For mvapich2, we have replaced the mpirun with a  
> script
> that looks like:
>
> - Get nodes from $PE_HOSTLIST or some other file
> - remove head node (the local node) from this list
> - Start mpd in background, grab port
> - Call pdsh to execute mpd with proper port on all other nodes
> - Call mpiexec with command line options passed to mpirun
> - call mpdallexit
>
> Users don't have to know about the ring, and you are not constrained
> to which node is the head node.
>
> Craig
>
>
>
>
>> -- Reuti
>>> Puzzled,
>>> Roberta
>>>
>>> -------------------------------------------------------------------- 
>>> -------------------------
>>> Roberta M. Gigon
>>> Schlumberger-Doll Research
>>> One Hampshire Street, MD-B253
>>> Cambridge, MA 02139
>>> 617.768.2099 - phone
>>> 617.768.2381 - fax
>>>
>>> This message is considered Schlumberger CONFIDENTIAL.  Please  
>>> treat the information contained herein accordingly.
>>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: Monday, April 28, 2008 6:17 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] mpi problems
>>>
>>> Hi Roberta,
>>>
>>> Am 28.04.2008 um 19:16 schrieb Roberta Gigon:
>>>
>>>> I have also tried using -machinefile $TMPDIR/machines in the script
>>>> file and get the same result.
>>>
>>> for the mpd-method you would need to modify the
>>> PeHostfile2MachineFile subroutine in startmpi.sh:
>>>
>>>     cat $1 | while read line; do
>>>        # echo $line
>>>        host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
>>>        nslots=`echo $line|cut -f2 -d" "`
>>>        echo $host:$nslots
>>>     done
>>>
>>>> We have been using the mpd method.
>>>
>>> There is no tight integration for the mpd-method (therefore it's not
>>> in the Howto), as the daemons would always start without being
>>> controlled by SGE. But even if you would like only a loose
>>> integration: how is this working with PBS: you are starting a  
>>> ring of
>>> mpds per job according to the MPICH2 manual page18? (How it this
>>> working with two different jobs on one node? mpdallexit would also
>>> shutdown the daemons of the other job as it has no args AFAICS.)
>>>
>>>> The master is on the head node of the cluster and the nodes are all
>>>> mpd "slaves".
>>>
>>> Or do you have the ring simply always active across the complete
>>> cluster?
>>>
>>>>   I didn't see instructions for the mpd method in the how-to.  The
>>>> program we are using with MPICH-2 is MCNP from Los Alamos National
>>>> Labs; I'm not sure if it works with any of the other methods.
>>>
>>> The MCNP is not publicy available. Do you have the source and could
>>> compile it just with another mpicc or so?
>>>
>>> -- Reuti
>>>
>>>
>>>>
>>>> Thanks!
>>>> Roberta
>>>>
>>>> P.S.  Regarding the bear72/bear75 confusion... I cut and pasted the
>>>> wrong error file entry... in reality, it is consistent.
>>>>
>>>> ------------------------------------------------------------------- 
>>>> ---
>>>> -----------------------
>>>> Roberta M. Gigon
>>>> Schlumberger-Doll Research
>>>> One Hampshire Street, MD-B253
>>>> Cambridge, MA 02139
>>>> 617.768.2099 - phone
>>>> 617.768.2381 - fax
>>>>
>>>> This message is considered Schlumberger CONFIDENTIAL.  Please treat
>>>> the information contained herein accordingly.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: Monday, April 28, 2008 12:07 PM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] mpi problems
>>>>
>>>> Hi,
>>>>
>>>> Am 28.04.2008 um 17:23 schrieb Roberta Gigon:
>>>>
>>>>> I'm having a few issues with getting MPICH-2  to work under SGE. I
>>>>> have an mpi job that works just fine with PBS and outside of SGE,
>>>>> so I'm pretty confident in saying that MPI itself is working.
>>>>
>>>> the included $SGE_ROOT/mpi is only for MPICH(1). There is a  
>>>> Howto for
>>>> MPICH2:
>>>>
>>>> http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-
>>>> integration.html
>>>>
>>>> Just take note, that MPICH2 can be compiled in at least 4 different
>>>> ways and the compilation (of your application) must use the
>>>> appropriate mpirun and SGE PE. Which type of startup do you want to
>>>> use?
>>>>
>>>> Anyway: you have no -machinefile or similar in your mpirun call,
>>>> hence all will be local. And: how it's getting from bear72 to  
>>>> bear75
>>>> - do you have any predefined mpd.hosts which could trigger this?
>>>>
>>>> -- Reuti
>>>>
>>>> PS: Please try the latest 1.0.7 of MPICH2 (although your 1.0.4p1
>>>> should be fine), at least 1.0.6p1 is broken.
>>>>
>>>>
>>>>>
>>>>> Some background:
>>>>> I have a pe called mpi with these characteristics:
>>>>>
>>>>> [root at bear ~]$ qconf -sp mpi
>>>>> pe_name           mpi
>>>>> slots             999
>>>>> user_lists        NONE
>>>>> xuser_lists       NONE
>>>>> start_proc_args   /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
>>>>> stop_proc_args    /opt/sge/mpi/stopmpi.sh
>>>>> allocation_rule   $round_robin
>>>>> control_slaves    FALSE
>>>>> job_is_first_task TRUE
>>>>> urgency_slots     min
>>>>>
>>>>> I have a queue called mpi.q with 6 dual processor nodes (12 slots)
>>>>>
>>>>> I submit the job like this:  qsub -q mpi.q -pe mpi 6 -cwd ./
>>>>> sbt034.csh
>>>>>
>>>>> sbt034.csh:
>>>>> #! /bin/tcsh
>>>>>
>>>>> #$ -q mpi.q
>>>>> #$ -j y
>>>>> #$ -o testSGE2.out
>>>>> #$ -N testSGE2
>>>>> #$ -cwd
>>>>> #$ -pe mpi 6
>>>>>
>>>>> echo running...
>>>>> echo $TMPDIR
>>>>> /usr/local/mpich2-1.0.4p1-pgi-k8-64/bin/mpirun -np 6 /people8/ 
>>>>> tzhou/
>>>>> mcnprun/SUN/bin/mcnp 5j.mpi i=sbt034 wwinp=sbwwmx05 eol
>>>>> echo done!
>>>>>
>>>>> qstat says:
>>>>>
>>>>> tzhou at bear[162] qstat
>>>>> job-ID  prior   name       user         state submit/start at
>>>>> queue                          slots ja-task-ID
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> -
>>>>> -------------------------------------------
>>>>>    6862 0.56000 testSGE2   tzhou        r     04/28/2008 10:52:48
>>>>> mpi.q at bear72.cl.slb.com            6
>>>>>
>>>>> error file says:
>>>>> master starting       5 tasks with       1 threads each  **/**/08
>>>>> **:**:10
>>>>>  master sending static commons...
>>>>>  master sending dynamic commons...
>>>>>  master sending cross section data...
>>>>> PGFIO/stdio: No such file or directory
>>>>> PGFIO-F-/OPEN/unit=32/error code returned by host stdio - 2.
>>>>>  In source file msgtsk.f90, at line number 116
>>>>> PGFIO/stdio: No such file or directory
>>>>> PGFIO-F-/OPEN/unit=32/error code returned by host stdio - 2.
>>>>>  In source file msgtsk.f90, at line number 116
>>>>> rank 4 in job 4  bear75.cl.slb.com_47485   caused collective abort
>>>>> of all ranks
>>>>>   exit status of rank 4: killed by signal 9
>>>>> done!
>>>>>
>>>>> The $TMPDIR gets set properly...
>>>>>
>>>>> Any thoughts on what might be happening here?
>>>>>
>>>>> Many thanks,
>>>>> Roberta
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> -
>>>>> -----------------------
>>>>> Roberta M. Gigon
>>>>> Schlumberger-Doll Research
>>>>> One Hampshire Street, MD-B253
>>>>> Cambridge, MA 02139
>>>>> 617.768.2099 - phone
>>>>> 617.768.2381 - fax
>>>>>
>>>>> This message is considered Schlumberger CONFIDENTIAL.  Please  
>>>>> treat
>>>>> the information contained herein accordingly.
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> -- 
> Craig Tierney (craig.tierney at noaa.gov)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list