[GE users] SGE support for cycle scavenging and Job checkpointing/migration

Atle Rudshaug atle at numericalrocks.com
Fri Sep 19 10:06:10 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi!

Is anyone using SGE for cycle scavenging? We have a dedicated cluster 
for MPI jobs using SGE and a lot of spare resources on workstations all 
over the office. We would like to use the workstations mostly for serial 
jobs but threaded ones as well, however we really need a checkpoint and 
migration option so the jobs will be preemted and paused/moved when the 
user returns to his/hers workstation.

I have read about including Condor's libraries, but it has too many 
limitations (gfortran not supported and max 2GB input files supported, 
etc.).

I have seen something about BLCR 
(http://www.escience.cam.ac.uk/projects/camgrid/blcr.html). How well 
does that work for SGE? Or do we need to include manual checkpointing 
into our applications? That will include a LOT of work which would be 
nice to avoid.

The main question is, how well does SGE handle cycle scavenging/job 
migration?

- Atle

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list