[GE users] SGE/OpenMPI - all MPI tasks run only on a single node

reuti reuti at staff.uni-marburg.de
Wed Dec 16 22:59:24 GMT 2009


Am 16.12.2009 um 23:50 schrieb k_clevenger:

>> Am 16.12.2009 um 22:31 schrieb k_clevenger:
>>
>>> Thanks for responding, this problem is somewhat perplexing
>>>
>>>> Am 16.12.2009 um 20:16 schrieb k_clevenger:
>>>>
>>>>> When an job is submitted all the tasks execute only on one  
>>>>> node. If
>>>>> I submit the same job via mpiexec on the cmdline tasks are
>>>>> dispersed correctly.
>>>>>
>>>>> I have reviewed "OpenMPI job on stay on one node", "Using ssh with
>>>>> qrsh and qlogin", the SGE sections on the OpenMPI site, etc. with
>>>>> no solution.
>>>>>
>>>>> Nodes: 16 core x86_64 blades
>>>>> OS (all): CentOS 5.4 x86_64
>>>>> SGE Version: 6_2u4
>>>>> OpenMPI Version: 1.3.3 compiled with --with-sge
>>>>> ompi_info: MCA ras: gridengine (MCA v2.0, API v2.0, Component
>>>>> v1.3.3)
>>>>> IPTables off
>>>>>
>>>>> PE:
>>>>> pe_name            openmpi
>>>>> slots              32
>>>>> user_lists         NONE
>>>>> xuser_lists        NONE
>>>>> start_proc_args    /opt/sge-6_2u4/mpi/startmpi.sh -catch_rsh
>>>>> $pe_hostfile
>>>>> stop_proc_args     /opt/sge-6_2u4/mpi/stopmpi.sh
>>>>
>>>> Both entries can be /bin/true. The defined procedures don't  
>>>> hurt, but
>>>> aren't necessary for a tight Open MPI integration.
>>>
>>> OK
>>>
>>>>
>>>>> allocation_rule    $round_robin
>>>>> control_slaves     TRUE
>>>>> job_is_first_task  FALSE
>>>>> urgency_slots      min
>>>>> accounting_summary FALSE
>>>>>
>>>>> SGE script:
>>>>> #!/bin/sh
>>>>> #$ -pe openmpi 22
>>>>> #$ -N Para1
>>>>> #$ -cwd
>>>>> #$ -j y
>>>>> #$ -V
>>>>> #
>>>>> mpiexec -np $NSLOTS -machinefile $TMPDIR/machines ./hello_c
>>>>
>>>> You can leave "-machinefile $TMPDIR/machines" out.
>>>
>>> When I do leave it out I get:
>>>
>>> error: commlib error: got read error (closing "sunnode00.coh.org/
>>> execd/1")
>>> error: executing task of job 262 failed: failed sending task to
>>> execd at sunnode00.coh.org: can't find connection
>>> -------------------------------------------------------------------- 
>>> --
>>> ----
>>> A daemon (pid 3692) died unexpectedly with status 1 while
>>> attempting to launch so we are aborting.
>>
>> Can you check, whether you are using the correct mpiexec from the
>> version you compiled with --with-sge?
>>
>> which mpiexec
>>
>
> I checked earlier for any other mpirun|exec anywhere in the file  
> system, all clean
>
> # which mpiexec
> /opt/openmpi-1.3.3/bin/mpiexec
>
> # ls -l /opt/openmpi-1.3.3/bin/mpiexec
> lrwxrwxrwx 1 root root 7 Nov  6 13:57 /opt/openmpi-1.3.3/bin/ 
> mpiexec -> orterun
>
> # ldd /opt/openmpi-1.3.3/bin/orterun
>   libopen-rte.so.0 => /opt/openmpi-1.3.3/lib/libopen-rte.so.0  
> (0x00002aaaaaaad000)
>   libopen-pal.so.0 => /opt/openmpi-1.3.3/lib/libopen-pal.so.0  
> (0x00002aaaaacf4000)
>   libdl.so.2 => /lib64/libdl.so.2 (0x0000003d2ec00000)
>   libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003d31c00000)
>   libutil.so.1 => /lib64/libutil.so.1 (0x0000003d3b600000)
>   libm.so.6 => /lib64/libm.so.6 (0x0000003d2f000000)
>   libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003d2f400000)
>   libc.so.6 => /lib64/libc.so.6 (0x0000003d2e800000)
>   /lib64/ld-linux-x86-64.so.2 (0x0000003d2e400000)

What is the output, when you test this inside a jobscript (and also a  
ldd hello_c). Depending on the .bashrc, the paths could be different  
inside a jobscript.

If you want to avoid dynamic binaries: I prefer to compile Open MPI  
with --enabled-static --disable-shared


>> in the job script. Also the dynamic libraries on the nodes are the
>> one from the --with-mpi compilation?
>>
>
> Yes, all of /opt on suncluster (head node) is exported and mounted  
> under /opt on the SGE nodes. Both sge-6_2u4 and
> openmpi-1.3.3 are under /opt.
>
>>>> When you put a "sleep 30" in the jobscript and check with `qstat  
>>>> -g `
>>>> the allocation during execution: slots on both machines were  
>>>> granted?
>>>
>>> job-ID  prior   name       user         state submit/start at
>>> queue                          master ja-task-ID
>>> -------------------------------------------------------------------- 
>>> --
>>> -----------------
>>>     262 0.60500 Job        kclevenger   r     12/16/2009 13:24:51
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>
>>> all.q at sunnode00.coh.org        SLAVE
>>>     262 0.60500 Job        kclevenger   r     12/16/2009 13:24:51
>>> all.q at sunnode01.coh.org        MASTER
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>>
>>> all.q at sunnode01.coh.org        SLAVE
>>
>> Okay, this is fine.
>>
>>
>>>> The PE is attached as default to the queue or listed in both
>>>> machine's specific settings?
>>>
>>> I'm not certain what you mean here
>>
>> Inside the queue configuration, you can bind certain PEs (ans also
>> other settings) to certain hosts or hostgroups. Although the above
>> listing rules it out as a wrong track, I had the idea that the PE
>> openmpi was only defined for sunode00. Then you would have gotten
>> always only one machine despite the $round_robin setting.
>>
>
> Ah, got it
>
>>
>>>>
>>>>> Run via SGE
>>>>> Hello, world, I am 0 of 22 running on sunnode00.coh.org
>>>>> Hello, world, I am 1 of 22 running on sunnode00.coh.org
>>>>> ...
>>>>> Hello, world, I am 20 of 22 running on sunnode00.coh.org
>>>>> Hello, world, I am 21 of 22 running on sunnode00.coh.org
>>>>>
>>>>> All 22 tasks run on sunnode00
>>>>>
>>>>> Run via cmdline 'mpiexec -np 22 -machinefile $HOME/machines ./
>>>>> hello_c'
>>>>> Hello, world, I am 0 of 22 running on sunnode00.coh.org
>>>>> Hello, world, I am 1 of 22 running on sunnode01.coh.org
>>>>> ....
>>>>> Hello, world, I am 20 of 22 running on sunnode00.coh.org
>>>>> Hello, world, I am 21 of 22 running on sunnode01.coh.org
>>>>>
>>>>> 11 tasks run on sunnode00 and 11 tasks run on sunnode01
>>>>>
>>>>> I also get all 22 tasks running on one node if I run something  
>>>>> like
>>>>> 'qrsh -V -verbose -pe openmpi 22 mpirun -np 22 -machinefile $HOME/
>>>>> machines $HOME/test/hello'
>>>>>
>>>>> qconf -sconf output is attached
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=233785
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].<qconf.txt>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=233822

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list