[GE users] Regarding CheckPointing

rajesh britto britto.gridlab at gmail.com
Sat Oct 4 07:17:57 BST 2008

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


     i need some help regarding checkpointing. i am submitting a job in SGE
and i cant find how checkpointing is made.. i submitted a check point sample
which i have seen from a website but its running as a normal sequential job.

     [sgeadmin at slaserver ~]$ cat chk.sh

       # check_transparent1.sh
       trap 'date >> $SGE_CKPT_DIR/checkpoint_1' usr2
        echo "Script started."
        for ((i=0; i<100; i++)) ; do
           sleep 1
           echo "$i interation"
       echo "Script finished."
      exit 0

   [sgeadmin at slaserver ~]$ qconf -sckpt check
       ckpt_name check
       interface transparent
       ckpt_command none
       migr_command none
       restart_command none
       clean_command none
       ckpt_dir /home/sgeadmin/ckpt
       signal usr2
       when xmr

     [sgeadmin at slaserver ~]$ qconf -sq all.q

qname all.q

hostlist @allhosts

seq_no 0

load_thresholds np_load_avg=1.75

suspend_thresholds NONE

nsuspend 1

suspend_interval 00:05:00

priority 0

min_cpu_interval 00:05:00

processors UNDEFINED


ckpt_list check

pe_list lam

rerun FALSE

slots 2,[slaserver.au.chn.in=1],[slanode02.au.chn.in=1]

tmpdir /tmp

shell /bin/sh

prolog NONE

epilog NONE ................

       i need to run user level checkpointing...

------- R B

More information about the gridengine-users mailing list