[GE users] checkpointing and SGE

Jerry Mersel jerry.mersel at weizmann.ac.il
Wed Jun 27 12:20:56 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi All:

    I decided not to go with blcr mostly because the mailing list seems 
to be very sparsely used, and
    there seems to be very little help.

    Can someone please recommend some checkpointing software that 
supports threads.
   Even if it isn't open software.

                                                Thank you,
                                                  Jerry

Chris Dagdigian wrote:

> Hi Jerry,
>
> Grid Engine can't magically checkpoint your application for migration  
> to another node -- all it really does is play nicely with either  
> applications or Operating Systems that themselves are checkpoint-aware.
>
> Either the code itself needs to be able to checkpoint locally or you  
> need to be running Grid Engine on an operating system that can do  
> system level checkpointing. To my knowledge, Linux and the standard  
> linux kernel does not have this sort of capability. I could not tell  
> from your messages what OS and kernel you are talking about.
>
> Most people I know who seriously use checkpointing in production  
> environments are doing it at the application level these days.
>
> Regards,
> Chris
>
>
>
>
> On Jun 24, 2007, at 5:49 AM, Jerry Mersel wrote:
>
>> In addition does the kernel have to be the same across all the nodes?
>>
>> It seems that the "N1GE6 Checkpointing and Berkeley lab Checkpoint/ 
>> Restart" doc
>> contradicts itself on weather a process can migrate across nodes.
>>
>>                                                           Regards,
>>                                                               Jerry
>>
>> Jerry Mersel wrote:
>>
>>> Hi:
>>>
>>>  I have to checkpoint a process and then restart the process on  
>>> another node.
>>>  I also have to use kernel checkpointing because I don't always  
>>> have access to
>>>  the code that is being run.
>>>
>>>  I read the documentation, N1GE6 Checkpointing and Berkeley lab  
>>> Checkpoint/Restart
>>>  and it seemed to say  that  the checkpointed process can't  
>>> migrate  to other nodes.
>>>  Am I  reading this correctly? Can someone recommend another method.
>>>
>>>
>>>                                                                       
>>> Regards,
>>>                                                                       
>>>   Jerry
>>
>
> -- 
> Chris Dagdigian  <dag at sonsorol.org>
> Current coordinates: Boston-area, USA
> GPS: http://bioteam.net/dagbin/gps?42.385693+N+71.115535+W
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list