[GE users] OpenMPI job on stay on one node

reuti reuti at staff.uni-marburg.de
Mon Sep 7 12:14:11 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 07.09.2009 um 12:53 schrieb sgexav:

> Hi, i still have the same problem
> 12 job on 8 core.......
> while polist is OK
> /opt/gridengine/default/spool/compute-0-3/active_jobs/35.1/pe_hostfile
> compute-0-3
> compute-0-3
> compute-0-3
> compute-0-3
> compute-0-3
> compute-0-3
> compute-0-3
> compute-0-3
> compute-0-1
> compute-0-1
> compute-0-1
> compute-0-1

you recompiled also your application (or at least relinked it)?

-- Reuti


> **** compute-0-3
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
> 10857 xavier    25   0  228m  61m 5092 R 100.1  0.4   0:29.90
> roms
> 10853 xavier    25   0  228m  52m 5068 R 99.8  0.3   0:29.39
> roms
> 10860 xavier    25   0  228m  61m 5036 R 99.8  0.4   0:29.48
> roms
> 10855 xavier    25   0  228m  61m 5144 R 99.1  0.4   0:29.83
> roms
> 10863 xavier    25   0  228m  61m 5036 R 56.2  0.4   0:17.37
> roms
> 10856 xavier    25   0  228m  61m 5108 R 54.2  0.4   0:16.75
> roms
> 10859 xavier    25   0  228m  61m 5044 R 51.2  0.4   0:15.60
> roms
> 10862 xavier    25   0  228m  61m 5032 R 50.5  0.4   0:15.62
> roms
> 10861 xavier    25   0  228m  61m 5036 R 49.5  0.4   0:15.53
> roms
> 10854 xavier    25   0  228m  61m 5108 R 48.9  0.4   0:15.45
> roms
> 10858 xavier    25   0  228m  61m 5076 R 45.2  0.4   0:14.07
> roms
> 10864 xavier    25   0  228m  49m 4896 R 43.6  0.3   0:13.97
> roms
>
> nothing on compute-0-1.
>
> Xav
>
> reuti a écrit :
>> Hi,
>>
>> Am 07.09.2009 um 12:04 schrieb sgexav:
>>
>>
>>> Hi,
>>> I am running a parallel code with openmpi.
>>> the code needs 12 proc and nodes have 8.
>>>
>>> With the classicam mpirun -np 12 ........; it works prefectly
>>>
>>> But with SGE the 12 process stay on one node.
>>>
>>> qsub script.sh
>>> Here is may submission script,
>>>  #!/bin/sh
>>> #
>>> # job name
>>> #$ -N ROMS_TEST
>>> #
>>> # Use current working directory
>>> #$ -cwd
>>> #
>>> # Join stdout and stderr
>>> #$ -j y
>>>
>>> # define queue
>>> #$ -q all.q
>>>
>>> #$ -pe mpi 12
>>>
>>> # Run job through bash shell
>>> #$ -S /bin/bash
>>> #
>>> echo "Got $NSLOTS processors."
>>> echo "Machines:"
>>> cat $TMPDIR/machines
>>>
>>> export PATH=/opt/openmpi-1.3.3/bin/:$PATH
>>> export
>>> LD_LIBRARY_PATH=/opt/openmpi-1.3.3/lib:/opt/intel/Compiler/11.1/046/
>>> lib/intel64:$LD_LIBRARY_PATH
>>> #export OMPI_MCA_pls_rsh_agent=rsh
>>>
>>> mpirun -np $NSLOTS  -hostfile $TMPDIR/machines ./roms roms.in
>>>
>>
>> a) you compiled Open MPI with SGE support?
>>
>> b) what does your PE look like - you followed the Howto for SGE on
>> the Open MPI site (http://www.open-mpi.org/faq/?category=running#run-
>> n1ge-or-sge)?
>>
>> -- Reuti
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=216223
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=216228
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216231

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list