[GE users] job suspension based on queue priority

reuti reuti at staff.uni-marburg.de
Mon Nov 10 11:12:50 GMT 2008


Am 08.11.2008 um 12:56 schrieb jigar_halani:

> Yes that will be fine, But I can re-queue these jobs?

sure. Either by hand (you could set "-m s" to get a mail on  
suspension, otherwise you won't notice it) and then use "qmod -rj  
<job_id>" or via some automatism:

> As they are very few jobs which will be exceeding the time limit  
> and they are very small. And users will be informed by e-mail so  
> they can either delete the job or can run in medium queue.

you can requeue the job automatically by abusing the checkpointing  
interface for it. You will just need to create a checkpointing  
interface with:

$ qconf -ackpt <a_name_you_like_here>

Just change one byte and save it. The default "when sx" is already  
fine for pur purpose. Attach this checkpointing interface to the  
necessary queue(s):

$ qconf -sq <your queue>
ckpt_list             <a_name_you_like_here>

Then submit the jobs with:

$ qsub -ckpt <a_name_you_like_here> my_job.sh

If you now suspend by hand or automatically the job (or complete  
queue), the job will go into "Rq" state, showing it was requeued. If  
it will start to be executed again, it will get the state "Rr". You  
could also check inside the job, whether it was restarted by the  
environment variable $RESTARTED (it's zero for normal runs and 1 or 2  
for reran jobs).

-- Reuti

> Regards,
> Jigar Halani
> Talentain Technologies
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88345
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list