[GE users] SGE 6.5 Scheduler Query--Reference
veerendra at yashasvi.co.in
Sun Feb 22 05:44:34 GMT 2009
The jobs that are run are typical ASIC design jobs (Layout and verification
jobs) and each of these jobs requires a license. I'm not sure if I have
stated the requirement.
1. We will have a short queue (short.q) configured which will have 5 slots
and configure a time limit for 5 min (a soft limit)
2. If we have run 5 jobs all the 5 jobs would have occupied the queue. Now
if we fire a 6th job if any of the jobs which has taken more than 5 min
should be put on hold and the 6th job should be executed.
How to automatically do it? As suggested by you how to use a co-scheduler?
Do you have a sample of how to implement check point?
I need some help....
From: reuti [mailto:reuti at staff.uni-marburg.de]
Sent: 22 February 2009 01:33
To: users at gridengine.sunsource.net
Subject: Re: [GE users] SGE 6.5 Scheduler Query--Reference : Mr.Sanjeev
Am 18.02.2009 um 17:32 schrieb veerendra_n:
> Hi All,
> I will be very grateful if you can help me clear some of the Sun
> Grid Engine 6.2 queries.
> My current query is regarding the Hard Run time limit and Soft Run
> Time Limit set on jobs :
> Lets say we create many queues. One of the queue is for 1 min
> jobs ( i.e. jobs that should take about 1 min to complete, short
> jobs ). Now lets say 5 jobs can run simultaneously and all the
> slots are occupied. Ideally the jobs should have been around 1 min.
> But due to some reason, the jobs are actually longer. If a sixth
> job is queued, it should get the priority since it is a short job.
> The way it should be done is that the oldest job is suspended, one
> license freed up , the sixth job ( just submitted ) run and the
> oldest one is back in the queue for the license. So when the sixth
> job is over, the oldest job can get the license again ( it is in
> the queue and will be processed based on the queue )
> I tried to test the above in my lab set up and the following is
> what I found :
> Hard Run Time setting :
> I configured a queue with Hard Run Time set to 3 minutes and
> tried to execute a job which takes more than 3 minutes.
> I found that the job got killed once the 3 minutes interval was
> (As per the sun grid document, a SIGKILL signal is sent and the job
> gets killed)
> Soft Run Time Setting:
> I configured a queue with Soft Run Time set to 3 minutes and
> Notify Interval to 60 sec and tried executing the same job.
> The job got killed
> (Again as per the document , a SIGUSR signal is sent as warning
> after 3 minutes and a SIGKILL signal is sent to kill the job once
> the Notify Interval is over )
> But as per the problem statement, I don't want the jobs to be
> killed but should be suspended and rejoin the queue and the job
> should resume once it gets a slot.
this is not implemented in SGE. Once a job was allowed to start, it
is supposed to run up to its end. I t might be suspended, but it will
still occupy the granted resources. This is not only a problem of
SGE, but also of your application: you would have to instruct it to
release its license temporarily.
You could use a co-scheduler, which would check the waiting and
running job. When it discovers, that another job should run, it has
to a) put a running job on hold (to prevent its immediate restart),
and b) reschedule the job. When there is no waiting job left, the
waiting (and rescheduled) one could be released and would restart
again. When I write restart, I mean it in exactly this way: without
any checkpointing, you job will always restart from the beginning.
> How can I achieve this. Is it possible to write a script and
> reschedule the job to resume rather than kill the job.
> I found that using the option Qalter( in QMON GUI) , I could
> reschedule the job manually, but this is not a solution for systems
> in real time environment.
> Is it possible through scripting or is there any other option in
> the QMON GUI which can solve this problem.
> Please help me solve this issue.
> Will be waiting for your reply.
To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users