[GE users] How important is checkpointing support?

agay agay at cc.huji.ac.il
Thu May 21 12:20:11 BST 2009

Dangruhn, thanks!

Well, if a non-pro user doesn't see a difference running Office applications then VMware seems to have done an excellent job!

> It sounds like this approach would be ideal
> for you. You'll need to do some things we
> aren't, but I don't see any show stoppers.

Yes, it looks feasible, just needs to work out the details.

> To be fair, SGE seems to support
> checkpoint/restart quite nicely. The
> problems are really at the application
> level: applications that can
> easily stop running on one machine and
> pick-up where it left off on
> another machine are few and far between.

I agree, SGE does everything it should do to support checkpointing. The problem is with the docs, e.g. the admin course book which may give the impression that you can move jobs whenever you like. I know an old hand who got this impression and had to convinced otherwise.

It's possible to do checkpointing not only at the application level but also at user level (e.g. Condor) and kernel level (e.g. Mosix). VM technology even offers "live" (or save/restore) migration of whole operating systems complete with all processes running on them. Maybe the last method doesn't have the inherent limitations of previous ones?


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list