[GE users] Checkpointing

sdiaz sdiaz at cesga.es
Tue May 12 12:01:07 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hello Bert,

Checkpointing in SGE is done each X seconds/minutes/hours when the job 
is running and also if you suspend a job in order to migrate this job 
(qmod -s JOB_ID). But it is configurable in a checkpoint environment. I 
think that by default the checkpoint in SGE doesn't work without 
something additional. It is just a feature that could be integrate with 
a checkpoint mechanism like BLCR (Berkeley Lab Checkpoint/Restart). You 
have to install a mechanism like this and then doing the integration 
with SGE it works.

More information 
https://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml

Regards,
Sergio



goemb escribió:
> Dear users,
>
> Does anyone have an idea how checkpointing is implemented in SGE?
> Is it only for fail save mechanisms or is it also for scheduling?
>
> Thank you all,
> Bert
>


-- 
Sergio Díaz Montes
Centro de Supercomputacion de Galicia
Avda. de Vigo. s/n (Campus Sur) 15706 Santiago de Compostela (Spain)
Tel: +34 981 56 98 10 ; Fax: +34 981 59 46 16
email: sdiaz at cesga.es ; http://www.cesga.es/
------------------------------------------------

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=194525

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list