[GE users] suspension under MPICH2 tight integration

Jason Crane jasonc at mrsc.ucsf.edu
Wed May 17 22:37:59 BST 2006


On Tue, 2006-05-16 at 23:35 +0200, Reuti wrote:

> > 1. The MPICH2 user's guide documentation indicates that it is possible
> > to suspend and continue MPICH2 jobs, at least under mpd process
> > management (nothing explicit about smpd).  However, in a previous post
> > it was mentioned that MPI suspend isn't supported for slave tasks  
> > under
> > SGE because of timing problems:
> > (http://gridengine.sunsource.net/servlets/ 
> > ReadMsglistName=users&msgNo=15354)
> > If standalone MPICH2 suspension is supported, then is the "timing
> > problem" introduced by the integration with SGE, or perhaps it's  
> > related
> > to using smpd?  Is there anything I need to worry about if I  
> > attempt to
> > implement a custom suspend/resume method for suspending slave tasks
> > under tight integration with the MPICH2 smpd daemonless parallel
> > environment?
> just try and let us know your results. Do you want to suspend it by  
> hand or with another parallel job (with the same allocation of nodes  
> - how?)?

I'm observing (in the trace file) that the custom suspend_method is not
executed for slave nodes within an MPI job, but rather only for the
master node.  Do you know if there is a way to override this behavior at
run-time, or does it need to be handled at the source code level?  If
so, do you have any hints about where to look?


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list