[GE users] checkpointing with blcr

Jerry Mersel jerry.mersel at weizmann.ac.il
Wed Dec 12 11:44:31 GMT 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Yep that helped.

          Thanks,
            Jerry


> Jerry,
>
> One solution might be to have your job exit with 99.  A job exit code of
> 99 tells the qmaster to reschedule the job.  That's the main mechanism
> for a job to say, "I landed in a bad place.  Please move me somewhere
> else."
>
> Daniel
>
> Jerry Mersel wrote:
>> Hi:
>>
>>  I manage to successfully checkpoint and rerun an application, with
>> migration.
>>  But I won't be able to do that if the PID is in use on the other
>> machine. (That the process migrated to).
>>
>>  What I want to do is have the job wait on its queue until the PID
>> becomes free.
>>  I simulated a situation where  the PID is in use, I find that it is
>> in use I then call
>>  qalter -q $QUEUE $JOB_ID, from the batch script.
>>
>> But it didn't work. The job was just killed
>>
>> Any ideas?
>>
>>                               Regards,
>>                                 Jerry
>>
>> P.S. I use BLCR and application_level checkpointing as in the how-to.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list