[GE users] SGE 6.5 Scheduler Query--Reference : Mr.Sanjeev Patil

veerendra_n veerendra at yashasvi.co.in
Wed Feb 18 16:32:31 GMT 2009

    [ The following text is in the "Windows-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi All,

I will be very grateful if you can help me clear some of the Sun Grid Engine 6.2 queries.

My current query is regarding the Hard Run time limit and Soft Run Time Limit set on jobs :

      Lets say we create many queues. One of the queue is for 1 min jobs ( i.e. jobs that should take about 1 min to complete, short jobs ). Now lets say 5 jobs can run simultaneously and all the slots are occupied. Ideally the jobs should have been around 1 min. But due to some reason, the jobs are actually longer. If a sixth job is queued, it should get the priority since it is a short job. The way it should be done is that the oldest job is suspended, one license freed up , the sixth job ( just submitted ) run and the oldest one is back in the queue for the license. So when the sixth job is over, the oldest job can get the license again ( it is in the queue and will be processed based on the queue )

I tried to test the above in my lab set up and the following is what I found :

Hard Run Time setting :

     I configured a queue with Hard Run Time  set to 3 minutes and tried to execute a job which takes more than 3 minutes.

I found that the job got killed once the 3 minutes interval was completed.

(As per the sun grid document, a SIGKILL signal is sent and the job gets killed)

Soft Run Time Setting:

    I configured a queue with Soft Run Time set to 3 minutes and Notify Interval to 60 sec and tried executing the same job.

The job got killed

(Again as per the document , a SIGUSR signal is sent as warning after 3 minutes and a SIGKILL signal is sent to kill the job once the Notify Interval is over )

But as per the problem statement, I don?t want the jobs to be  killed but should be suspended and rejoin the queue and the job should resume once it gets a slot.

How can I achieve this. Is it possible to write a script and reschedule the job to resume rather than kill the job.

I found that using the option Qalter( in QMON GUI) , I could reschedule the job manually, but this is not a solution for systems in real time environment.

Is it possible through scripting or is there any other option in the QMON GUI which can solve this problem.

Please help me solve this issue.

Will be waiting for your reply.


Yashasvi Information Solutions Pvt.Ltd

#418 , 17th Main , 10th Cross

JP Nagar , 2nd Phase

Bangalore ? 560 078

Mobile : +91-9972520661

Email    :  veerendra at yashasvi.co.in<mailto:veerendra at yashasvi.co.in>

More information about the gridengine-users mailing list