[GE users] Checkpointing howto example n°1

spow_ miomax_ at hotmail.com
Mon Jul 12 15:03:12 BST 2010

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Thanks for your help, it does work now.
Permissions of the output folder were not set right thus blocking the creation of the file.
As for the blank directory, it happened when I forgot to add the -ckpt option. (don't laugh)

However, I'm still wondering how user-level checkpointing works... I'm stuck at understanding the basics of it.

My guess is that the admin will distribute 2 files : a C code, the one in the example n°3 of the howto, and a script which will mostly locate the C code. The user has nothing to do, except modify the script to execute his code.
Does the checkpointing understand by itself which variables are to be saved ?
Is it really possible that it works for 2 very different programs without changing anything? If there is a huge matrix created from simple initial conditions, will it save it ? And how does it know it exists and needs to be saved ?

What I do not want is to have end-users having to modify the codes they currently run so that they can checkpoint.
But I seriously doubt the user-level checkpointing works that way, kernel-level seems more appropriate.


> Date: Thu, 8 Jul 2010 15:54:20 +0200
> From: reuti at staff.uni-marburg.de
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Checkpointing howto example n°1
> Hi,
> Am 08.07.2010 um 14:50 schrieb spow_:
> > It appears I was badly mistaken with the Kernel/User-based checkpointing. Thanks for the precision.
> > I am following Reuti's howto : http://gridengine.sunsource.net/howto/checkpointing.html and I try to have the first script (example 1) running.
> > I have set all options accordingly to the howto :
> >
> > queue parameters :
> > shell : bin/bash
> > shell_start_mode : unix_behaviour
> > referenced_checkpoint_objects : check_transparent
> > flush_submit_sec=4
> > min_cpu_interval 00:00:15
> >
> > check_transparent parameters :
> > interface transparent
> > commands none
> > ckpt_dir /tmp/checkpoint
> often /tmp is local. I.e. the file will only be on the compute node, and neither on the head node, nor be copied to the node where the re-scheduled job will start. In case you have more than one compute node, it's best to use a shared directory like the /home/checkpoint I suggested.
> That the name is not not set, is of course a different issue. Did you specify:
> $ qsub -ckpt check_transparent test.sh
> ...
> $ qstat -j <jobid>
> ...
> checkpoint_object: check_transparent
> checkpoint_attr: sx
> for the job submission?
> -- Reuti
> > signal USR2
> > when xmr
> >
> > Unfortunately, the file that should be created under $SGE_CKPT_DIR isn't, though the script does execute until the end, without any errors.
> > If I echo $SGE_CKPT_DIR, it echoes blank, even though it is correctly specified in the checkpoint params.
> > If I set SGE_CKPT_DIR=/tmp/checkpoint in the script right before the above echo, it does display /tmp/checkpoint in the output file.
> > But in both cases, no checkpoint file is created, and I cannot witness the dates being printed in the checkpoint file (for it doesn't exist).
> >
> > A few years back, a user named Sangamesh had a similar problem (though his file got created) but I found no leads in the answers he had been given.
> > Maybe there are externate modules I have to install ? I have a 'fresh' install of SGE, nothing else.
> >
> >
> > Thanks for your help.
> > Guillaume Quéré
> >
> > PS : sorry for not publishing the pv messages Reuti, but I get an error anytime I try to post from my previous account.
> >
> > Le nouveau Messenger arrive ! Téléchargez-le gratuitement et découvrez ses nouvelles fonctionnalités
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266721
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

Le nouveau Messenger arrive ! Téléchargez-le gratuitement et découvrez ses nouvelles fonctionnalités<http://clk.atdmt.com/FRM/go/244627952/direct/01/>

More information about the gridengine-users mailing list