[GE users] Regarding CheckPointing

Reuti reuti at staff.uni-marburg.de
Sat Oct 4 11:39:17 BST 2008


Hi,

Am 04.10.2008 um 08:17 schrieb rajesh britto:

> hi,
>
>      i need some help regarding checkpointing. i am submitting a  
> job in SGE and i cant find how checkpointing is made.. i submitted  
> a check point sample which i have seen from a website but its  
> running as a normal sequential job.
>
>      [sgeadmin at slaserver ~]$ cat chk.sh
>
>        #!/bin/sh
>        # check_transparent1.sh
>        trap 'date >> $SGE_CKPT_DIR/checkpoint_1' usr2
>         echo "Script started."
>         for ((i=0; i<100; i++)) ; do
>            sleep 1
>            echo "$i interation"
>         done
>        echo "Script finished."
>       exit 0
>
all checkpoint jobs will run as usual like sequential or parallel  
jobs. For example 1 there is no checkpoint at all, as it explains the  
delivery of signals for the "when m" of the checkpointing interface  
only . Please read on with example 2 from the Howto.

> min_cpu_interval 00:05:00

For testing purpose this should be lowered to 15 seconds or so.

-- Reuti


>
> processors UNDEFINED
> qtype BATCH INTERACTIVE
> ckpt_list check
> pe_list lam
> rerun FALSE
> slots 2,[slaserver.au.chn.in=1],[slanode02.au.chn.in=1]
> tmpdir /tmp
> shell /bin/sh
> prolog NONE
> epilog NONE ................
>
>        i need to run user level checkpointing...
>
> ------- R B
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list