[GE users] is SGE integrated in mpiexec?

reuti reuti at staff.uni-marburg.de
Fri Mar 12 10:45:45 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 11.03.2010 um 21:02 schrieb goncalo:

> Hi Reuti...
>
>>> OpenMPI and MPICH2. My problem is that I do not understand how those
>>> tight integration are done... via qrsh
>>>
>> Yep, with the optional parameter "-inherit". I suggest to start
>> getting Open MPI working, then you can look into MPICH2: http://
>> gridengine.sunsource.net/howto/mpich2-integration/mpich2-
>> integration.html
>>
>
> I think I have it working with Openmpi with qrsh using rshd.

There is no rshd anywhere in the process chain, hence it's the - 
builtin- method. It's adjustable in `qconf -mconf`. Just leave it  
this way for now.


> I have just
> sent a 16 instance job. Looking to the tree of processes, I see:
>
> 1.) In the first 12 core node:
> 1.1) "mpirun" starts 2 "qrsh -inherit" processes.
> 1.2) there is a "qrsh_starter" called by a different "sge_shepherd",
> which calls orted and 12 job instances.
>
> 2.) In the second core node:
> 2.) there is only a "qrsh_starter" called by "sge_shepherd", which  
> calls
> orted and 4 job instances.
>
> - Could you explain a bit the relationship between 1.1) and 1.2)? Why
> are they started by different sge_shepherd?

It's just the way it's implemented, but Open MPI 1.2.8 is quite old:

http://www.open-mpi.org/software/ompi/versions/timeline.php

it looks different today, hence I suggest to go for the actual 1.4.1.  
This will be more self-explanatory and a local qrsh for the processes  
on the master node is no longer used.

-- Reuti


> - Where does rsh enters in the game exactly?
>
>
>
> Here is the tree of processes in the first node:
>
> ??sge_execd,5185
> ? ??sge_shepherd,5455 -bg
> ? ? ??1988639,5463
> /usr/local/sge/pro/default/spool/hpc001/job_scripts/1988639
> ? ? ??mpirun,5464 --prefix /usr/mpi/gcc/openmpi-1.2.8
> -hostfile/usr/local/sge/pro/default/spool/hpc001/active_jobs/1988639.
> ? ? ??qrsh,5465 -inherit -noshell -nostdin -V  
> hpc001.ncg.ingrid.pt
> /usr/mpi/gcc/openmpi-1.2.8/bin/orted --no-daemonize --bootproxy 1 -- 
> name
> 0.0.1--num_pro
> ? ? ? ??{qrsh},5475
> ? ? ? ??{qrsh},5477
> ? ? ??qrsh,5466 -inherit -noshell -nostdin -V  
> hpc002.ncg.ingrid.pt
> /usr/mpi/gcc/openmpi-1.2.8/bin/orted --no-daemonize --bootproxy 1 -- 
> name
> 0.0.2--num_pro
> ? ? ??{qrsh},5476
> ? ? ??{qrsh},5478
> ? ??sge_shepherd,5467 -bg
> ? ? ??qrsh_starter,5468
> /usr/local/sge/V62u1/default/spool/hpc001/active_jobs/ 
> 1988639.1/1.hpc001
> noshell
> ? ? ? ??orted,5479 --no-daemonize --bootproxy 1 --name  
> 0.0.1 --num_procs
> 3 --vpid_start 0 --nodename hpc001.ncg.ingrid.pt--un
> ? ? ? ??cpi.openmpi,5480
> ? ? ? ??cpi.openmpi,5481
> ? ? ? ??cpi.openmpi,5482
> ? ? ? ??cpi.openmpi,5483
> ? ? ? ??cpi.openmpi,5484
> ? ? ? ??cpi.openmpi,5485
> ? ? ? ??cpi.openmpi,5486
> ? ? ? ??cpi.openmpi,5487
> ? ? ? ??cpi.openmpi,5488
> ? ? ? ??cpi.openmpi,5489
> ? ? ? ??cpi.openmpi,5490
> ? ? ? ??cpi.openmpi,5491
>
>
> Here is the tree of processes in the second node:
>
> ??sge_execd,22291
> ? ??sge_shepherd,13828 -bg
> ? ? ??qrsh_starter,13829/usr/local/sge/V62u1/default/
> ? ? ? ??orted,13836 --no-daemonize --bootproxy 1 --name  
> 0.0.2-
> ? ? ? ??cpi.openmpi,13837
> ? ? ? ??cpi.openmpi,13838
> ? ? ? ??cpi.openmpi,13839
> ? ? ? ??cpi.openmpi,13840
>
>
>>> connections for the user applications for all hosts).
>>>
>> Which version of SGE? The -builtin- method would fit all needs of
>> yours, unless you need X11 forwarding.
>>
>
> I'm using SGE6.2u1. Is it possible to use the -builtin- method? What
> does it do?
>
> Thanks
> Goncalo
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=248068
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248143

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list