[GE users] suspension under MPICH2 tight integration

Jason Crane jasonc at mrsc.ucsf.edu
Tue May 16 23:16:59 BST 2006


On Tue, 2006-05-16 at 23:35 +0200, Reuti wrote:
> Hi Jason,
> 
> Am 16.05.2006 um 22:45 schrieb Jason Crane:
> 
> > Hi,
> >
> > I'm writing in regard to the "Tight Integration of the MPICH2 library
> > into SGE" doc (http://gridengine.sunsource.net/howto/mpich2-
> > integration/mpich2-integration.html).  I have 2 questions.
> >
> > 1. The MPICH2 user's guide documentation indicates that it is possible
> > to suspend and continue MPICH2 jobs, at least under mpd process
> > management (nothing explicit about smpd).  However, in a previous post
> > it was mentioned that MPI suspend isn't supported for slave tasks  
> > under
> > SGE because of timing problems:
> > (http://gridengine.sunsource.net/servlets/ 
> > ReadMsglistName=users&msgNo=15354)
> > If standalone MPICH2 suspension is supported, then is the "timing
> > problem" introduced by the integration with SGE, or perhaps it's  
> > related
> > to using smpd?  Is there anything I need to worry about if I  
> > attempt to
> > implement a custom suspend/resume method for suspending slave tasks
> > under tight integration with the MPICH2 smpd daemonless parallel
> > environment?
> 
> just try and let us know your results. Do you want to suspend it by  
> hand or with another parallel job (with the same allocation of nodes  
> - how?)?

Ideally as a subordinate queue, on a node by node basis as needed either
from a batch or parallel job.  I'll let you know the results if I get
anywhere with it.

> 
> > 2. The MPICH2 tight integration document indicates that job accounting
> > is handled accurately under the smpd daemon-based smpd startup method.
> > Is it also handled correctly under the daemonless method?
> 
> Usually Tight Integration mean, that the created processes are under  
> control of SGE and that the accounting is correct. I mentioned it  
> only for the smpd method, because some processes are still outside of  
> SGE control by intention (the qrsh ones), but they won't generate any  
> load to be accounted for.

thanks for the confirmation.  -Jason



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list