[GE users] Long delay when submitting large jobs

Ron Chen ron_chen_123 at yahoo.com
Tue Feb 8 07:56:48 GMT 2005


--- Reuti <reuti at staff.uni-marburg.de> wrote:
> Aha, if I got it correctly you will still call lamboot which will use the TM. 
> Then use some times mpirun, and at the end a lamhalt shouldn't be really 
> necessary, as the daemons will be killed at the end of the job anyway.

Correct - in the current PBS-LAM integration, lamboot calls the PBS MOM (via tm_spawn()) on each
node to start lamd.

I am not sure which one is better... whether the batch system kills the LAM daemons or it just
lets lamhalt to do it. (and when lamhalt fails, then the batch system will kill them?)

Anyway, I worked with Brian before on integrating SGE-LAM. And then later on, other gridengine
developers think that it is better to use SGE's qrsh + scripts to glue SGE and LAM together.
However, I find many people don't know that SGE can be tightly integrated with LAM (but not
"out-of-the-box" as others mentioned):

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2005-February/009893.html

There are some papers that Brain pointed me to, and I think may be they are useful:

http://www.lam-mpi.org/papers/

"Integration of the LAM/MPI Environment and the PBS Scheduling System"

"Boot System Services Interface (SSI) Modules for LAM/MPI"

"The System Services Interface (SSI) to LAM/MPI"

 -Ron




		
__________________________________ 
Do you Yahoo!? 
Read only the mail you want - Yahoo! Mail SpamGuard. 
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list