[GE users] how to checkpoint cadence simulator spectre

Reuti reuti at staff.uni-marburg.de
Mon Nov 5 16:20:41 GMT 2007


Hi,

Am 05.11.2007 um 16:27 schrieb Jan Sundermeyer:

> Hello,
>
> we have recently installed SGE and we want to use with cadence spectre
> simulation.
> Right now i try to set up check pointing for the simulator.
>
> Theoretically this should be quite simple:
>
> spectre writes a checkpoint on the reception of SIGUSR2 and terminates
> itself.
> If it is rerun with the option "+recover" and a checkpoint file is
> present it continues to simulate from that saved point.
>
> However i have failed to get it to work with sge6.1u2.
>
> 1) checkpoint via the transparent mode does not work.
> If i want to let generate a checkpoint on suspend, the process gets  
> killed.
> If i let it checkpoint on reschedule, no checkpoint is written but it
> tries to jump over the first simulation steps on rerun, a rather
> unexpected behaviour,

some nice state diagrams you can find in this document:

http://gridengine.sunsource.net/howto/APSTC-TB-2004-005.pdf

> 2) checkpoint via application level mode does not work (at least if i
> want to checkpoint on suspend) as the process gets suspended first
> before it can receive SIGUSR2, thus no checkpoint is written

Yes, the checkpoint has to be created before the job gets suspended.  
This can be done in the migration procedure however. This might  
explain some details:

http://gridengine.sunsource.net/howto/checkpointing.html

Just to note, that signal will be send to the complete processgroup,  
so some trapping in the shell-script might be necessary. Otherwise  
the job appears to be killed by just getting a usr2 signal.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list