[GE users] Reschudule of a job

veerendra_n veerendra at yashasvi.co.in
Wed Feb 18 17:53:23 GMT 2009

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


My current query is regarding the Hard Run time limit and Soft Run Time Limit set on jobs :        

      Lets say we create many queues. One of the queue is for 1 min jobs ( i.e. jobs that should take about 1 min to complete, short jobs ). Now lets say 5 jobs can run simultaneously and all the slots are occupied. Ideally the jobs should have been around 1 min. But due to some reason, the jobs are actually longer. If a sixth job is queued, it should get the priority since it is a short job. The way it should be done is that the oldest job is suspended, one license freed up , the sixth job ( just submitted ) run and the oldest one is back in the queue for the license. So when the sixth job is over, the oldest job can get the license again ( it is in the queue and will be processed based on the queue )


I tried to test the above in my lab set up and the following is what I found :


Hard Run Time setting :

     I configured a queue with Hard Run Time  set to 3 minutes and tried to execute a job which takes more than 3 minutes. 

I found that the job got killed once the 3 minutes interval was completed.   

(As per the sun grid document, a SIGKILL signal is sent and the job gets killed)


Soft Run Time Setting:

    I configured a queue with Soft Run Time set to 3 minutes and Notify Interval to 60 sec and tried executing the same job.

The job got killed

(Again as per the document , a SIGUSR signal is sent as warning after 3 minutes and a SIGKILL signal is sent to kill the job once the Notify Interval is over )


But as per the problem statement, I don?t want the jobs to be  killed but should be suspended and rejoin the queue and the job should resume once it gets a slot.

How can I achieve this. Is it possible to write a script and reschedule the job to resume rather than kill the job.

I found that using the option Qalter( in QMON GUI) , I could reschedule the job manually, but this is not a solution for systems in real time environment.

Is it possible through scripting or is there any other option in the QMON GUI which can solve this problem.

Please help me solve this issue.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list