[GE users] suspension under MPICH2 tight integration

Reuti reuti at staff.uni-marburg.de
Wed May 17 22:58:17 BST 2006


Am 17.05.2006 um 23:37 schrieb Jason Crane:

> Hi,
>
> On Tue, 2006-05-16 at 23:35 +0200, Reuti wrote:
>
>>> 1. The MPICH2 user's guide documentation indicates that it is  
>>> possible
>>> to suspend and continue MPICH2 jobs, at least under mpd process
>>> management (nothing explicit about smpd).  However, in a previous  
>>> post
>>> it was mentioned that MPI suspend isn't supported for slave tasks
>>> under
>>> SGE because of timing problems:
>>> (http://gridengine.sunsource.net/servlets/
>>> ReadMsglistName=users&msgNo=15354)
>>> If standalone MPICH2 suspension is supported, then is the "timing
>>> problem" introduced by the integration with SGE, or perhaps it's
>>> related
>>> to using smpd?  Is there anything I need to worry about if I
>>> attempt to
>>> implement a custom suspend/resume method for suspending slave tasks
>>> under tight integration with the MPICH2 smpd daemonless parallel
>>> environment?
>>
>> just try and let us know your results. Do you want to suspend it by
>> hand or with another parallel job (with the same allocation of nodes
>> - how?)?
>
> I'm observing (in the trace file) that the custom suspend_method is  
> not
> executed for slave nodes within an MPI job, but rather only for the
> master node.  Do you know if there is a way to override this  
> behavior at

This is the intended behavior. You have to use any rsh/ssh inside the  
master node's custom suspend_method to do something on the slave  
nodes. What is MPICH2 expecting - to get a SIGSTOP to all involved  
processes at nearly the same time?

-- Reuti

> run-time, or does it need to be handled at the source code level?  If
> so, do you have any hints about where to look?
>
> thanks,
> Jason
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list