[GE users] Long delay when submitting large jobs

Rayson Ho raysonho at eseenet.com
Mon Feb 7 23:01:51 GMT 2005

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

>Aha, if I got it correctly you will still call lamboot which will
>use the TM. Then use some times mpirun, and at the end a lamhalt
>shouldn't be really necessary, as the daemons will be killed at
>the end of the job anyway.

Need to read the code... but I can't do it now :(

BTW, what does our current LAM 7.x implementation do??

>But I can get access to a PBS cluster, I will look 
>into it for curiosity.

Thx, IMO our tight PE integration is not good enough because it doesn't
"work out of the box" -- you need to configure a PE, and you need to make
sure the "parallel job launcher" (e.g. mpirun) calls the qrsh.

And the PE for each MPI implementation is different, which is also a weak

>First we should see a working TM in SGE, and then decide whether 
>there can anything be removed

We may not need to remove anything, may be we can create a type of PE
called the "tight TM PE", with which qmaster doesn't start rshds.

> or changed in the current PE support. I think 
>also of other parallel environments like Linda and TCGMSG, which >can be
>already today with SGE without any problems. And if you use the
>daemonless version of MPICH2, it's also working today.

Yes, but again, just offload the MPI implementation details to mpiexec. The
TM interface is cleaner than what we have today.

>PBS has no shepered and therefore the handling of communication in
>PBS-mpiexec?), and this function could be built also into SGE.

Sorry, don't understand what you are talking about.

PBS has MOM, and it is just like the execd.


>Cheers - Reuti
Get your FREE E-mail account at http://www.eseenet.com !

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list