[GE users] Help: Checkpoint Problem

Reuti reuti at staff.uni-marburg.de
Thu Oct 9 11:20:56 BST 2008


Hi,

Am 09.10.2008 um 07:36 schrieb Lee Amy:

> I run parallel bioinformatics software at a cluster. And MPI  
> implementation is Open MPI 1.2.7. I know that this bioinformatis  
> software dosen't have built-in checkpoint function. So my problem  
> is can I use SGE to achieve that? However I have read the great  
> howtos written by Reuti at
> http://gridengine.sunsource.net/howto/checkpointing.html

first goal in your case should be, to get checkpointing working  
without SGE. For now it's not in the stable version of Open MPI:

http://www.open-mpi.org/faq/?category=ft

You might try the developer version or LAM/MPI in combination with  
BCLR (http://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml).  
If you have this working, the checkpoint creation and migration can  
be triggered by SGE.

SGE will support checkpointing, but doesn't provide it to the  
application on its own.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list