[GE users] OpenMPI job on stay on one node

reuti reuti at staff.uni-marburg.de
Mon Sep 7 12:17:41 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Am 07.09.2009 um 13:05 schrieb sgexav:

> reuti a écrit :
>> Hi,
>>
>> Am 07.09.2009 um 12:04 schrieb sgexav:
>>
>>
>>> Hi,
>>> I am running a parallel code with openmpi.
>>> the code needs 12 proc and nodes have 8.
>>>
>>> With the classicam mpirun -np 12 ........; it works prefectly
>>>
>>> But with SGE the 12 process stay on one node.
>>>
>>> qsub script.sh
>>> Here is may submission script,
>>>  #!/bin/sh
>>> #
>>> # job name
>>> #$ -N ROMS_TEST
>>> #
>>> # Use current working directory
>>> #$ -cwd
>>> #
>>> # Join stdout and stderr
>>> #$ -j y
>>>
>>> # define queue
>>> #$ -q all.q
>>>
>>> #$ -pe mpi 12
>>>
>>> # Run job through bash shell
>>> #$ -S /bin/bash
>>> #
>>> echo "Got $NSLOTS processors."
>>> echo "Machines:"
>>> cat $TMPDIR/machines
>>>
>>> export PATH=/opt/openmpi-1.3.3/bin/:$PATH
>>> export
>>> LD_LIBRARY_PATH=/opt/openmpi-1.3.3/lib:/opt/intel/Compiler/11.1/046/
>>> lib/intel64:$LD_LIBRARY_PATH
>>> #export OMPI_MCA_pls_rsh_agent=rsh
>>>
>>> mpirun -np $NSLOTS  -hostfile $TMPDIR/machines ./roms roms.in
>>>
>>
>> a) you compiled Open MPI with SGE support?
>>
>> b) what does your PE look like - you followed the Howto for SGE on
>> the Open MPI site (http://www.open-mpi.org/faq/?category=running#run-
>> n1ge-or-sge)?
>>
>>
> and if use pe orte 12 which seems to be the good one
>
> i get
> Got 12 processors.
> Machines:
> cat: /tmp/38.1.all.q/machines: No such file or directory

as Lydia wrote: you don't need this argument, just leave the option - 
machinefile ... out. Open MPI will detect the granted nodes on its  
own from the original pe_hostfile. The $TMPDIR/machines would be  
created by the start_proc_args for other MPI libraries, but can be  
left out here hence the file won't be created.

-- Reuti


> ---------------------------------------------------------------------- 
> ----
> Open RTE was unable to open the hostfile:
>     /tmp/38.1.all.q/machines
> Check to make sure the path and filename are correct.
> .....
> Xav
>> -- Reuti
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=216223
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=216230
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216233

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list