[GE users] Checkpointing howto example n°1

reuti reuti at staff.uni-marburg.de
Thu Jul 8 14:54:20 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Am 08.07.2010 um 14:50 schrieb spow_:

> It appears I was badly mistaken with the Kernel/User-based checkpointing. Thanks for the precision.
> I am following Reuti's howto : http://gridengine.sunsource.net/howto/checkpointing.html and I try to have the first script (example 1) running.
> I have set all options accordingly to the howto :
> 
> queue parameters :
> shell : bin/bash
> shell_start_mode : unix_behaviour
> referenced_checkpoint_objects : check_transparent
> flush_submit_sec=4
> min_cpu_interval 00:00:15
> 
> check_transparent parameters :
> interface       transparent
> commands    none
> ckpt_dir       /tmp/checkpoint

often /tmp is local. I.e. the file will only be on the compute node, and neither on the head node, nor be copied to the node where the re-scheduled job will start. In case you have more than one compute node, it's best to use a shared directory like the /home/checkpoint I suggested.

That the name is not not set, is of course a different issue. Did you specify:

$ qsub -ckpt check_transparent test.sh
...
$ qstat -j <jobid>
...
checkpoint_object:          check_transparent
checkpoint_attr:            sx    

for the job submission?

-- Reuti


> signal           USR2
> when            xmr
> 
> Unfortunately, the file that should be created under $SGE_CKPT_DIR isn't, though the script does execute until the end, without any errors.
> If I echo $SGE_CKPT_DIR, it echoes blank, even though it is correctly specified in the checkpoint params.
> If I set SGE_CKPT_DIR=/tmp/checkpoint in the script right before the above echo, it does display /tmp/checkpoint in the output file.
> But in both cases, no checkpoint file is created, and I cannot witness the dates being printed in the checkpoint file (for it doesn't exist).
> 
> A few years back, a user named Sangamesh had a similar problem (though his file got created) but I found no leads in the answers he had been given.
> Maybe there are externate modules I have to install ? I have a 'fresh' install of SGE, nothing else.
> 
> 
> Thanks for your help.
> Guillaume Quéré
>  
> PS : sorry for not publishing the pv messages Reuti, but I get an error anytime I try to post from my previous account.
> 
> Le nouveau Messenger arrive ! Téléchargez-le gratuitement et découvrez ses nouvelles fonctionnalités

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266721

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list