[GE users] How important is checkpointing support?

dev dev_hyd2001 at yahoo.com
Fri May 15 17:25:14 BST 2009


I've been working with SGE for a few years now, but of late I've seen lot of checkpointing requests from customers. Its becoming common to see users with more cpus than licenses and this is increasing demand for checkpointing. However, so far , I've been utilizing checkpointing features only if the application by itself supports it. I haven't tested mush on Windows but I guess the applications which provide checkpointing on Linux should also do the same for Windows environments. Even if a cluster is interactively used, my opinion is checkpointing is useful to provide resources for higher priority jobs by checkpointing lower priority ones. This is just my opinion though !



--- On Thu, 5/14/09, agay <agay at cc.huji.ac.il> wrote:

From: agay <agay at cc.huji.ac.il>
Subject: [GE users] How important is checkpointing support?
To: users at gridengine.sunsource.net
Date: Thursday, May 14, 2009, 4:11 PM

SGE supports external checkpointing solutions. Such solutions allow SGE to restart/migrate jobs/processes to minimize the damage of host malfunctions and implement load balancing.

Do you find checkpointing useful in your work? Does it work ok with all your programs? Is it reliable?

I understand checkpointing is a non-trivial technology now actively developed on UNIX. Do you know any practical Windows solution?

What solution would you suggest for a cluster that is used interactively during the day, maybe with occasional reboots?


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net</mc/compose?to=users-unsubscribe at gridengine.sunsource.net>].

More information about the gridengine-users mailing list