[GE users] Long delay when submitting large jobs

Reuti reuti at staff.uni-marburg.de
Tue Feb 8 09:55:31 GMT 2005

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


many thanks for all the links and explanations. I will look into it, and then 
continue with the discussion.

The processgroup problem is with MPICH, that it will jump out of the 
processgroup, and is no longer under control of the qrsh_starter. And if the 
platform you are using is not supporting the shutdown of processes via the 
additional group id, you have to avoid the creation for now. I don't know, why 
this isn't the default in MPICH. So: is the started task by PBS+mpiexec on a 
slave node already the processleader and can be shut down in a proper way, or 
is PBS+mpiexec also losing control under some circumstances after the 

But anyway, I saw one thing in the spec of int "tm_nodeinfo(list, nnodes)":

How to destinguish between "make one tm_spawn to node x and use threads/forks 
to get two CPUs" and "make two tm_spawn to node x".

Cheers - Reuti

Quoting Ron Chen <ron_chen_123 at yahoo.com>:

> --- Reuti <reuti at staff.uni-marburg.de> wrote: 
> > > Yes, but again, just offload the MPI implementation details to mpiexec.
> The
> > > TM interface is cleaner than what we have today.
> > 
> > Agreed.
> BTW, if OpenPBS/PBSPro/Torque and Gridengine all use TM, then may be more MPI
> vendors will be
> happy to provide TM enabled "parallel job starter" (mpirun?)
> > Well, let me explain this way: for now, SGE will catch the rsh, start the
> rshd, 
> > an use a 'real' rsh to start the communication. Instead of starting the
> rshd, 
> > why not directly start the program on the node as child of the shepered? No
> > rshd in the way. Would this work with all parallel programs out of the
> box?
> Not exactly, because SGE itself (without qrsh/rshd) doesn't forward stdio.
> > I looked into the mpiexec source, and at a first glance I saw support for
> all 
> > the communication types like ch_p4 and gm. And the question was: is this 
> > necessary, because you have only PBS_MOM and no sheperd on the slave nodes
> with 
> > PBS - so you must take care of the communication on your own?
> Not because of that.
> Remember, mpiexec only replaces mpirun, and the main difference is that
> mpiexec starts tasks using
> TM APIs instead of rsh.
> Honestly, I currently don't understand the mpiexec source, as I just started
> reading it today. But
> luckily, it is less than 5000 lines of code, and with lots of comments 8-)
> And I saw some discussions on the mpiexec list today:
> http://email.osc.edu/pipermail/mpiexec/2005/000723.html
> So it seems like there's something called the "listener" which is different
> depending on the
> communication mode, and also there are different env setups that needed to be
> done depending on it
> too.
> > And: what will mpiexec do with the processgroups on the slaves?
> BTW, don't know much about that, what is it for?
> > And argh, will the TM also send the environment variables to the 
> > slave nodes like -V? Okay, I will see it - give me some days.
> Yes, from the tm manpage:
> http://www.usc.edu/hpcc/pbsman/man3/tm.html
> int tm_spawn(argc, argv, envp, where, tid, event)
> So it does allow you to pass **envp to the remote node.
>  -Ron
> __________________________________ 
> Do you Yahoo!? 
> Read only the mail you want - Yahoo! Mail SpamGuard. 
> http://promotions.yahoo.com/new_mail 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list