[GE users] job suspension based on queue priority

jigar_halani jigar at talentain.com
Mon Nov 10 12:26:40 GMT 2008


Hi Reuti, 

Thanks for the reply. Some more query on the same:

Let's say instead of suspend job kills automatically then also it will
re-queue. why I am asking is because queue kills the job because of the hard
limit. And If I have given time limit it will kill the job.

Regards,
Jigar halani
Talentain Technologies

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Monday, November 10, 2008 4:43 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] job suspension based on queue priority

Hi,

Am 08.11.2008 um 12:56 schrieb jigar_halani:

> Yes that will be fine, But I can re-queue these jobs?

sure. Either by hand (you could set "-m s" to get a mail on  
suspension, otherwise you won't notice it) and then use "qmod -rj  
<job_id>" or via some automatism:

> As they are very few jobs which will be exceeding the time limit  
> and they are very small. And users will be informed by e-mail so  
> they can either delete the job or can run in medium queue.

you can requeue the job automatically by abusing the checkpointing  
interface for it. You will just need to create a checkpointing  
interface with:

$ qconf -ackpt <a_name_you_like_here>

Just change one byte and save it. The default "when sx" is already  
fine for pur purpose. Attach this checkpointing interface to the  
necessary queue(s):

$ qconf -sq <your queue>
...
ckpt_list             <a_name_you_like_here>
...

Then submit the jobs with:

$ qsub -ckpt <a_name_you_like_here> my_job.sh

If you now suspend by hand or automatically the job (or complete  
queue), the job will go into "Rq" state, showing it was requeued. If  
it will start to be executed again, it will get the state "Rr". You  
could also check inside the job, whether it was restarted by the  
environment variable $RESTARTED (it's zero for normal runs and 1 or 2  
for reran jobs).

-- Reuti


>
> Regards,
> Jigar Halani
> Talentain Technologies
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88345
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=8
8369

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].
 

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3597 (20081108) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3597 (20081108) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88374

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list