[GE users] job suspension based on queue priority

reuti reuti at staff.uni-marburg.de
Mon Nov 10 13:07:45 GMT 2008


Am 10.11.2008 um 13:26 schrieb jigar_halani:

> Hi Reuti,
>
> Thanks for the reply. Some more query on the same:
>
> Let's say instead of suspend job kills automatically then also it will
> re-queue.

if you use a checkpointing interface: a suspend will kill & re-queue  
the job. Without checkpointing interface: it's only a suspend.

> why I am asking is because queue kills the job because of the hard
> limit.

If the job is killed because of a hard limit, it won't be re-queued.  
It would be useless. It would restart from the beginning and hit the  
limit again.

-- Reuti

> And If I have given time limit it will kill the job.
>
> Regards,
> Jigar halani
> Talentain Technologies
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Monday, November 10, 2008 4:43 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] job suspension based on queue priority
>
> Hi,
>
> Am 08.11.2008 um 12:56 schrieb jigar_halani:
>
>> Yes that will be fine, But I can re-queue these jobs?
>
> sure. Either by hand (you could set "-m s" to get a mail on
> suspension, otherwise you won't notice it) and then use "qmod -rj
> <job_id>" or via some automatism:
>
>> As they are very few jobs which will be exceeding the time limit
>> and they are very small. And users will be informed by e-mail so
>> they can either delete the job or can run in medium queue.
>
> you can requeue the job automatically by abusing the checkpointing
> interface for it. You will just need to create a checkpointing
> interface with:
>
> $ qconf -ackpt <a_name_you_like_here>
>
> Just change one byte and save it. The default "when sx" is already
> fine for pur purpose. Attach this checkpointing interface to the
> necessary queue(s):
>
> $ qconf -sq <your queue>
> ...
> ckpt_list             <a_name_you_like_here>
> ...
>
> Then submit the jobs with:
>
> $ qsub -ckpt <a_name_you_like_here> my_job.sh
>
> If you now suspend by hand or automatically the job (or complete
> queue), the job will go into "Rq" state, showing it was requeued. If
> it will start to be executed again, it will get the state "Rr". You
> could also check inside the job, whether it was restarted by the
> environment variable $RESTARTED (it's zero for normal runs and 1 or 2
> for reran jobs).
>
> -- Reuti
>
>
>>
>> Regards,
>> Jigar Halani
>> Talentain Technologies
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=88345
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=8
> 8369
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus  
> signature
> database 3597 (20081108) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus  
> signature
> database 3597 (20081108) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88374
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88376

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list