[GE users] looking for OpenMPI start/stop proc_args

Reuti reuti at staff.uni-marburg.de
Fri Oct 17 13:19:11 BST 2008


Davide,

Am 17.10.2008 um 13:30 schrieb Davide Cittaro:

> Hi again,
>
> On Oct 17, 2008, at 1:16 PM, Reuti wrote:
>
>> For Open MPI 1.2.x the SGE support is built-in automatically, for  
>> 1.3 you will need to configure with "--with-sge".
>
>
> Ok, thanks
> I've noticed that batch jobs are spawned across different hosts  
> (good), but the same is not true for interactive jobs:
>
> $ qlogin -pe orte 8 -l arch=lx26-amd64
> Your job 186714 ("QLOGIN") has been submitted
> waiting for interactive job to be scheduled ...

please forget about Open MPI for a moment, but try to understand what  
qlogin or qrsh (w/o a command) will do on the submit and execution  
node. Inspecting the process tree on both sides will reveal what's  
going on: a fresh bash is started - nothing is set at all which would  
point to the fact, that it's running under SGE.

The discussion from today with Andreas Haupt was exactly the same,  
and a similar solution to this should also work for you. Just grep  
the SGE archive, unless this discussion is still in your mail  
application. It's always good to read other peoples posts too.


Maybe easier would be to run a comand line like:

qrsh -pe orte 8 -l arch=lx26-amd64 mpirun -np \$NSLOTS ./hello_amd64

> Your interactive job 186714 has been successfully scheduled.
> Establishing /opt/sge/util/qlogin_wrapper.sh session to host  
> node3.sge.ifom-ieo-campus.it ...

What is your wrapper doing in detail?

To avoid any problems with new SGE releases, I always put own  
scripts, PEs, and so on in a directory /usr/sge/cluster.

-- Reuti


> Last login: Fri Oct 17 13:20:59 2008 from master.xtal.ifom-ieo- 
> campus.it
> Linux node3.sge.ifom-ieo-campus.it 2.6.15-gentoo-r7-smp #1 SMP Thu  
> May 11 14:11:25 CEST 2006 x86_64 AMD Opteron(tm) Processor 250  
> AuthenticAMD GNU/Linux
>  $ qstat -t
> job-ID  prior   name       user         state submit/start at      
> queue                          master ja-task-ID task-ID state  
> cpu        mem     io      stat failed
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node1.sge.ifom-ieo-cam SLAVE
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node2.sge.ifom-ieo-cam SLAVE
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node3.sge.ifom-ieo-cam MASTER
>                                                                    
> login.q at node3.sge.ifom-ieo-cam SLAVE
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node4.sge.ifom-ieo-cam SLAVE
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node5.sge.ifom-ieo-cam SLAVE
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node6.sge.ifom-ieo-cam SLAVE
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node8.sge.ifom-ieo-cam SLAVE
>  186714 0.55052 QLOGIN     dcittaro     r     10/17/2008 13:27:30  
> login.q at node9.sge.ifom-ieo-cam SLAVE
>
> $ mpirun -np 8 ./hello_amd64
> Now starting mpi stuff..
> Now starting mpi stuff..
> Now starting mpi stuff..
> Now starting mpi stuff..
> Now starting mpi stuff..
> Now starting mpi stuff..
> Now starting mpi stuff..
> Now starting mpi stuff..
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
> My rank before if is 0
> My name before if is node3.sge.ifom-ieo-campus.it
> My rank in if is 1
> My name in if is node3.sge.ifom-ieo-campus.it
>
>
> So, sge spawns on 8 nodes, but ompi stays on master...
> As I said, batch jobs work in proper way.
>
> d
> /*
> Davide Cittaro
>
> Cogentech - Consortium for Genomic Technologies
> via adamello, 16
> 20139 Milano
> Italy
>
> tel.: +39(02)574303007
> e-mail: davide.cittaro at ifom-ieo-campus.it
> */
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list