[GE users] BLCR with SGE?

Reuti reuti at staff.uni-marburg.de
Tue Feb 22 17:57:22 GMT 2005


AFAIK checkpointing is simply not defined in the MPI standard. When 
LAM/MPI and AMPI/Charm++ support it, it's an extension - not necessarily 
  available in MPICH.

On thing with MPICH: how to send a synchronous signal to all nodes "do a 
checkpoint now!". When the daemon in LAM/MPI support it, it's fine.

And AMPI/Charm++ can do it only on an application level, with all 
processes calling the checkpointing command (something to handle like 
MPI_Barrier).

It's not SGE related. Myrinet is also stating on their website, that 
checkpointing is not supported.


Rayson: It's good that you are working on a TM, I was just thinking in 
digging into LAM/MPI to get a SGE modul, but this way I will wait, 
because it's of course the cleaner solution. It was also just on the LAM 
list (or where?), that the TM of OpenPBS is not exactly working 
according to the docs, and that they programmed around it a little bit. 
So the LAM/MPI with TM will not work when you implement TM according to 
the docs - only to warn you - I can't find it again, but I red it somewhere.

Maybe you can switch/add to the LAM/MPI-dev list at a later point in 
time for this point.

Cheers - Reuti


Rayson Ho wrote:
> Do you know why it is not working with MPICH-GM?? Does it work at all even
> without running under SGE??
> 
> I don't have access to a Myrinet cluster, but since I am working on
> creating a TM lib for SGE (so that we can use mpiexec, for example), I am
> trying to find out if there's anything that we can do to support it...
> 
> Rayson
> 
> 
> 
>>  I implemented it on a Myrinet cluster (MPICH-GM) but there is no 
>>support for checkpointing MPI jobs.  The single processor jobs where 
>>checkpointed successfully.  I you run LAM, then you will be able to 
>>checkpoint  parallel jobs.  Feel free to send your questions to Lip 
>>Kian, he was very helpful in my implementation.  Thanks again Lip.
>>
>>Ciao!
> 
> ---------------------------------------------------------
> Get your FREE E-mail account at http://www.eseenet.com !
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list