[GE users] Long delay when submitting large jobs

Reuti reuti at staff.uni-marburg.de
Mon Feb 7 13:47:37 GMT 2005


John Hearns wrote:
> On Mon, 2005-02-07 at 10:18 +0100, Reuti wrote:
>>- MPICH2 has it's own mpiexec (which has just this name, and not the function 
>>of the PBS tight-integration program). A new name would be better chosen for 
>>this tight-mpiexec. And: you can compile MPICH2 startup in more than one way. 
>>I'm not sure, whether all are supported, as the supplied version is for a beta 
>>version of MPICH2 only.
>>But please keep the start_proc_args/stop_proc_args anyway, as we need special 
>>directories to be created on the slave nodes. Maybe it's personal taste, but I 
>>would still prefer setting up all the daemons for LAM/MPICH2 in 
>>start_proc_args, and the end-user can just use mpirun/mpich2-mpiexec, since all 
>>is already setup.
> You raise a good point - it has probably been discussed before.
> How do we handle MPICH2 and MPD daemons in SGE?
> If MPD is used to start/stop the worker processes, then what role does
> SGE play, and how can SGE do tight integration, accounting, and keeping
> track of used/unused job slots?

I looked already into it by starting one "ring" of daemons for each job 
with a unique port number. So keeping track of slots is no problem:


If you prefer a daemonless solution:


When you use the daemonless setup, the accounting may work. I must 
admit, that I didn't looked into it, because accounting is no point for 
us. - Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list