[GE users] SGE/OpenMPI - all MPI tasks run only on a single node

reuti reuti at staff.uni-marburg.de
Wed Dec 16 21:53:16 GMT 2009


Am 16.12.2009 um 22:31 schrieb k_clevenger:

> Thanks for responding, this problem is somewhat perplexing
>
>> Am 16.12.2009 um 20:16 schrieb k_clevenger:
>>
>>> When an job is submitted all the tasks execute only on one node. If
>>> I submit the same job via mpiexec on the cmdline tasks are
>>> dispersed correctly.
>>>
>>> I have reviewed "OpenMPI job on stay on one node", "Using ssh with
>>> qrsh and qlogin", the SGE sections on the OpenMPI site, etc. with
>>> no solution.
>>>
>>> Nodes: 16 core x86_64 blades
>>> OS (all): CentOS 5.4 x86_64
>>> SGE Version: 6_2u4
>>> OpenMPI Version: 1.3.3 compiled with --with-sge
>>> ompi_info: MCA ras: gridengine (MCA v2.0, API v2.0, Component  
>>> v1.3.3)
>>> IPTables off
>>>
>>> PE:
>>> pe_name            openmpi
>>> slots              32
>>> user_lists         NONE
>>> xuser_lists        NONE
>>> start_proc_args    /opt/sge-6_2u4/mpi/startmpi.sh -catch_rsh
>>> $pe_hostfile
>>> stop_proc_args     /opt/sge-6_2u4/mpi/stopmpi.sh
>>
>> Both entries can be /bin/true. The defined procedures don't hurt, but
>> aren't necessary for a tight Open MPI integration.
>
> OK
>
>>
>>> allocation_rule    $round_robin
>>> control_slaves     TRUE
>>> job_is_first_task  FALSE
>>> urgency_slots      min
>>> accounting_summary FALSE
>>>
>>> SGE script:
>>> #!/bin/sh
>>> #$ -pe openmpi 22
>>> #$ -N Para1
>>> #$ -cwd
>>> #$ -j y
>>> #$ -V
>>> #
>>> mpiexec -np $NSLOTS -machinefile $TMPDIR/machines ./hello_c
>>
>> You can leave "-machinefile $TMPDIR/machines" out.
>
> When I do leave it out I get:
>
> error: commlib error: got read error (closing "sunnode00.coh.org/ 
> execd/1")
> error: executing task of job 262 failed: failed sending task to  
> execd at sunnode00.coh.org: can't find connection
> ---------------------------------------------------------------------- 
> ----
> A daemon (pid 3692) died unexpectedly with status 1 while  
> attempting to launch so we are aborting.

Can you check, whether you are using the correct mpiexec from the  
version you compiled with --with-sge?

which mpiexec

in the job script. Also the dynamic libraries on the nodes are the  
one from the --with-mpi compilation?


>> When you put a "sleep 30" in the jobscript and check with `qstat -g `
>> the allocation during execution: slots on both machines were granted?
>
> job-ID  prior   name       user         state submit/start at      
> queue                          master ja-task-ID
> ---------------------------------------------------------------------- 
> -----------------
>     262 0.60500 Job        kclevenger   r     12/16/2009 13:24:51  
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>                                                                    
> all.q at sunnode00.coh.org        SLAVE
>     262 0.60500 Job        kclevenger   r     12/16/2009 13:24:51  
> all.q at sunnode01.coh.org        MASTER
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE
>                                                                    
> all.q at sunnode01.coh.org        SLAVE

Okay, this is fine.


>> The PE is attached as default to the queue or listed in both
>> machine's specific settings?
>
> I'm not certain what you mean here

Inside the queue configuration, you can bind certain PEs (ans also  
other settings) to certain hosts or hostgroups. Although the above  
listing rules it out as a wrong track, I had the idea that the PE  
openmpi was only defined for sunode00. Then you would have gotten  
always only one machine despite the $round_robin setting.



>>
>>> Run via SGE
>>> Hello, world, I am 0 of 22 running on sunnode00.coh.org
>>> Hello, world, I am 1 of 22 running on sunnode00.coh.org
>>> ...
>>> Hello, world, I am 20 of 22 running on sunnode00.coh.org
>>> Hello, world, I am 21 of 22 running on sunnode00.coh.org
>>>
>>> All 22 tasks run on sunnode00
>>>
>>> Run via cmdline 'mpiexec -np 22 -machinefile $HOME/machines ./ 
>>> hello_c'
>>> Hello, world, I am 0 of 22 running on sunnode00.coh.org
>>> Hello, world, I am 1 of 22 running on sunnode01.coh.org
>>> ....
>>> Hello, world, I am 20 of 22 running on sunnode00.coh.org
>>> Hello, world, I am 21 of 22 running on sunnode01.coh.org
>>>
>>> 11 tasks run on sunnode00 and 11 tasks run on sunnode01
>>>
>>> I also get all 22 tasks running on one node if I run something like
>>> 'qrsh -V -verbose -pe openmpi 22 mpirun -np 22 -machinefile $HOME/
>>> machines $HOME/test/hello'
>>>
>>> qconf -sconf output is attached
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=233785
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].<qconf.txt>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=233811

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list