[GE users] Long delay when submitting large jobs

Rayson Ho raysonho at eseenet.com
Mon Feb 7 20:37:30 GMT 2005

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

>you mean because it's just mentioned on the mpiexec page (version 0.77)?
>- the LAM/MPI patch is a little bit out of date and for 6.5.8 
>(current 7.1.1 - is it still working?).

No, the current LAM-PBS integration (for LAM 7.x) doesn't use mpiexec, but
LAM directly calls the TM API.

>And: you can compile MPICH2 startup in more than one way. 
>I'm not sure, whether all are supported, as the supplied version is 
>for a beta version of MPICH2 only.

You can still use the TM mpiexec for MPICH2, see this mail from the author
of the TM mpiexec --

"mpich2 - will the real mpiexec please stand up"

>But please keep the start_proc_args/stop_proc_args anyway, as we 
>need special directories to be created on the slave nodes. Maybe 
>it's personal taste, but I would still prefer setting up all the 
>daemons for LAM/MPICH2 in start_proc_args, and the end-user can 
>just use mpirun/mpich2-mpiexec, since all is already setup.

Agreed, or we can use the current prolog/epilog interface...

IMO, SGE should only take care of allocating nodes, doing job accounting,
and killing jobs... but launching the parallel jobs should be done by a
parallel job launcher, such as mpiexec.

When done correctly via the TM APIs, we still get tight PE integration, but
the work of integrating with each MPI implementation is offloaded to


>CU - Reuti
Get your FREE E-mail account at http://www.eseenet.com !

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list