[GE users] Integrating SGE with condor and BLCR

Constantinos Evangelinos ce107 at MIT.EDU
Wed Nov 28 18:35:33 GMT 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On Wednesday 28 November 2007 07:36:37 am Reuti wrote:

> Am 28.11.2007 um 12:29 schrieb Neeraj Chourasia:
> > Thanks Reuti,
> >    Its working now, but i am not sure if it can be used to
> > checkpoint Openmpi application. Since Openmpi doesnt have their own
> > checkpointing implemented, can BLCR/Condor be extended to support
> > checkpointing?
>
> OpenMPI 1.3 will have built-in checkpointing AFAIK. So I would wait
> for this release. Checkpointing of parallel apps is by far more
> complicated than serial ones.
>
> Only option for now would be to build application-level checkpointing
> into your application, i.e. the rank 0 process has to write the
> computed data and state of the program to the checkpointing file from
> time to time and resume from this (like outlined for a serial
> application in my Howto).

The only working MPI implementations with working checkpointing that I know of 
are:
a) LAM 7 + BLCR. LAM is of course an abandoned child in favor of OpenMPI but 
it still works, it has a tight SGE integration (which I assume is a must for 
any parallel checkpointing) and in my experience routinely offered lower 
latency and comparable if not higher bandwidth over Gigabit Ethernet to 
OpenMPI.
b) MVAPICH2 + BLCR. Theoretically this is only for IB clusters but in fact 
MVAPICH2 + BLCR will also work over Gigabit Ethernet. It is not as speedy as 
one would like for intra-node comms as it does not use shared memory and 
sends messages via the loopback of the network interface. The problem is that 
for tight SGE integration one needs to use it with the smpd process manager 
instead of the python based mpd one and I've been unable to get it to even 
build properly this way (the configure scripts keep messing up my choices). 
It may be that - unlike MPICH2 - MVAPICH2 (which is based on it) really does 
rely on using mpd. Or I just need to try harder... ;-)
c) SCORE MPICH which comes with it's own checkpointing mechanism that allows 
multiple MPI jobs to time-share the same nodes with larger granularity 
context switching of gang-scheduled parallel jobs.

Condor checkpointing does not extend to parallel processes AFAIK.

Constantinos 
-- 
Dr. Constantinos Evangelinos                    Room 54-1518, EAPS/MIT
Earth, Atmospheric and Planetary Sciences       77 Massachusetts Avenue
Massachusetts Institute of Technology           Cambridge, MA 02139
+1-617-253-5259/+1-617-253-4464 (fax)           USA

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list