[GE users] is SGE integrated in mpiexec?

goncalo goncalo at lip.pt
Thu Mar 11 20:02:47 GMT 2010


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Reuti...

>> OpenMPI and MPICH2. My problem is that I do not understand how those
>> tight integration are done... via qrsh
>>      
> Yep, with the optional parameter "-inherit". I suggest to start
> getting Open MPI working, then you can look into MPICH2: http://
> gridengine.sunsource.net/howto/mpich2-integration/mpich2-
> integration.html
>    

I think I have it working with Openmpi with qrsh using rshd. I have just 
sent a 16 instance job. Looking to the tree of processes, I see:

1.) In the first 12 core node:
1.1) "mpirun" starts 2 "qrsh -inherit" processes.
1.2) there is a "qrsh_starter" called by a different "sge_shepherd", 
which calls orted and 12 job instances.

2.) In the second core node:
2.) there is only a "qrsh_starter" called by "sge_shepherd", which calls 
orted and 4 job instances.

- Could you explain a bit the relationship between 1.1) and 1.2)? Why 
are they started by different sge_shepherd?
- Where does rsh enters in the game exactly?



Here is the tree of processes in the first node:

??sge_execd,5185
? ??sge_shepherd,5455 -bg
? ? ??1988639,5463 
/usr/local/sge/pro/default/spool/hpc001/job_scripts/1988639
? ? ??mpirun,5464 --prefix /usr/mpi/gcc/openmpi-1.2.8 
-hostfile/usr/local/sge/pro/default/spool/hpc001/active_jobs/1988639.
? ? ??qrsh,5465 -inherit -noshell -nostdin -V hpc001.ncg.ingrid.pt 
/usr/mpi/gcc/openmpi-1.2.8/bin/orted --no-daemonize --bootproxy 1 --name 
0.0.1--num_pro
? ? ? ??{qrsh},5475
? ? ? ??{qrsh},5477
? ? ??qrsh,5466 -inherit -noshell -nostdin -V hpc002.ncg.ingrid.pt 
/usr/mpi/gcc/openmpi-1.2.8/bin/orted --no-daemonize --bootproxy 1 --name 
0.0.2--num_pro
? ? ??{qrsh},5476
? ? ??{qrsh},5478
? ??sge_shepherd,5467 -bg
? ? ??qrsh_starter,5468 
/usr/local/sge/V62u1/default/spool/hpc001/active_jobs/1988639.1/1.hpc001 
noshell
? ? ? ??orted,5479 --no-daemonize --bootproxy 1 --name 0.0.1 --num_procs 
3 --vpid_start 0 --nodename hpc001.ncg.ingrid.pt--un
? ? ? ??cpi.openmpi,5480
? ? ? ??cpi.openmpi,5481
? ? ? ??cpi.openmpi,5482
? ? ? ??cpi.openmpi,5483
? ? ? ??cpi.openmpi,5484
? ? ? ??cpi.openmpi,5485
? ? ? ??cpi.openmpi,5486
? ? ? ??cpi.openmpi,5487
? ? ? ??cpi.openmpi,5488
? ? ? ??cpi.openmpi,5489
? ? ? ??cpi.openmpi,5490
? ? ? ??cpi.openmpi,5491


Here is the tree of processes in the second node:

??sge_execd,22291
? ??sge_shepherd,13828 -bg
? ? ??qrsh_starter,13829/usr/local/sge/V62u1/default/
? ? ? ??orted,13836 --no-daemonize --bootproxy 1 --name 0.0.2-
? ? ? ??cpi.openmpi,13837
? ? ? ??cpi.openmpi,13838
? ? ? ??cpi.openmpi,13839
? ? ? ??cpi.openmpi,13840


>> connections for the user applications for all hosts).
>>      
> Which version of SGE? The -builtin- method would fit all needs of
> yours, unless you need X11 forwarding.
>    

I'm using SGE6.2u1. Is it possible to use the -builtin- method? What 
does it do?

Thanks
Goncalo

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248068

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "S/MIME Cryptographic Signature" ]
    [ Application/PKCS7-SIGNATURE (Name: "smime.p7s") 2.7 KB. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list