[GE users] Integrating SGE with condor and BLCR

Reuti reuti at staff.uni-marburg.de
Wed Nov 28 10:27:35 GMT 2007


Hi,

Am 28.11.2007 um 09:55 schrieb Neeraj Chourasia:

> Hello Guys,
>
>    I tried integrating SGE with 3rd party checkpointing library say  
> condor and BLCR, but unable to checkpoint the application. On  
> searching the mailing list i found an issue below
>
>        http://gridengine.sunsource.net/issues/show_bug.cgi?id=2037
>
>   I am able to checkpoint the condor, if i manually send the  
> application USR2 signal, but on suspending queue/job, the SGE is  
> not checkpointing the application.
> The configuration of Condor chekpoint is as follows

you followed:

http://gridengine.sunsource.net/howto/checkpointing.html

and

http://gridengine.sunsource.net/howto/APSTC-TB-2004-005.pdf

What I see is, that you didn't include "m" in the "when" option. This  
you will need, or include in the BLCR checkpointing a call to the  
checkpointing script in the migrate script. The state diagramms in  
Lip Kian's Howto show the actual behavior of SGE. Checkpoints are  
only created in "min_cpu_interval" time steps.

-- Reuti


> >  qconf -sckpt check_transparent
> ckpt_name          check_transparent
> interface          TRANSPARENT
> ckpt_command       NONE
> migr_command       NONE
> restart_command    NONE
> clean_command      NONE
> ckpt_dir           /home/neeraj/checkpoint
> signal             USR2
> when               xs
>
>
> Similarly for BLCR
>
> >qconf -sckpt BLCR
> ckpt_name          BLCR
> interface          APPLICATION-LEVEL
> ckpt_command       /home/neeraj/local/sge/ckpt/blcr/ 
> blcr_checkpoint.sh $job_id \
>                   $job_pid $ckpt_dir
> migr_command       /home/neeraj/local/sge/ckpt/blcr/blcr_migrate.sh  
> $job_id \
>                   $job_pid $ckpt_dir
> restart_command    NONE
> clean_command      /home/neeraj/local/sge/ckpt/blcr/blcr_clean.sh  
> $job_id \
>                   $job_pid $ckpt_dir
> ckpt_dir           /home/neeraj/checkpoint
> signal             NONE
> when               xsr
>
>
> Please help me...
>
> -Neeraj
>
> The information contained in this electronic message and any  
> attachments to this message are intended for the exclusive use of  
> the addressee(s) and may contain proprietary, confidential or  
> privileged information. If you are not the intended recipient, you  
> should not disseminate, distribute or copy this e-mail. Please  
> notify the sender immediately and destroy all copies of this  
> message and any attachments contained in it.
>
> Contact your Administrator for further information.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list