[GE users] SGE 6.5 Scheduler Query--Reference

reuti reuti at staff.uni-marburg.de
Mon Feb 23 12:13:13 GMT 2009


Hi,

Am 22.02.2009 um 06:44 schrieb veerendra_n:

> The jobs that are run are typical ASIC design jobs (Layout and  
> verification
> jobs) and each of these jobs requires a license. I'm not sure if I  
> have
> stated the requirement.
>
> 1. We will have a short queue (short.q) configured which will have  
> 5 slots
> and configure a time limit for 5 min (a soft limit)
> 2. If we have run 5 jobs all the 5 jobs would have occupied the  
> queue. Now
> if we fire a 6th job if any of the jobs which has taken more than 5  
> min
> should be put on hold and the 6th job should be executed.
>
> How to automatically do it? As suggested by you how to use a co- 
> scheduler?
> Do you have a sample of how to implement check point?

this is far from being trivial. Best would be to have someone at your  
location and look into it. 5 minutes looks like a short turnaround.  
How long are your jobs running usually?

The first thing to check is, whether your application can be  
triggered to be put to sleep and give a license back at all (I assume  
this is counted by something like FLEXlm or alike).

-- Reuti


> I need some help....
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 22 February 2009 01:33
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] SGE 6.5 Scheduler Query--Reference :  
> Mr.Sanjeev
> Patil
>
> Veerendra,
>
> Am 18.02.2009 um 17:32 schrieb veerendra_n:
>
>> Hi All,
>>
>> I will be very grateful if you can help me clear some of the Sun
>> Grid Engine 6.2 queries.
>>
>> Query:
>> My current query is regarding the Hard Run time limit and Soft Run
>> Time Limit set on jobs :
>>       Lets say we create many queues. One of the queue is for 1 min
>> jobs ( i.e. jobs that should take about 1 min to complete, short
>> jobs ). Now lets say 5 jobs can run simultaneously and all the
>> slots are occupied. Ideally the jobs should have been around 1 min.
>> But due to some reason, the jobs are actually longer. If a sixth
>> job is queued, it should get the priority since it is a short job.
>> The way it should be done is that the oldest job is suspended, one
>> license freed up , the sixth job ( just submitted ) run and the
>> oldest one is back in the queue for the license. So when the sixth
>> job is over, the oldest job can get the license again ( it is in
>> the queue and will be processed based on the queue )
>>
>> I tried to test the above in my lab set up and the following is
>> what I found :
>>
>> Hard Run Time setting :
>>      I configured a queue with Hard Run Time  set to 3 minutes and
>> tried to execute a job which takes more than 3 minutes.
>> I found that the job got killed once the 3 minutes interval was
>> completed.
>> (As per the sun grid document, a SIGKILL signal is sent and the job
>> gets killed)
>>
>> Soft Run Time Setting:
>>     I configured a queue with Soft Run Time set to 3 minutes and
>> Notify Interval to 60 sec and tried executing the same job.
>> The job got killed
>> (Again as per the document , a SIGUSR signal is sent as warning
>> after 3 minutes and a SIGKILL signal is sent to kill the job once
>> the Notify Interval is over )
>>
>> But as per the problem statement, I don't want the jobs to be
>> killed but should be suspended and rejoin the queue and the job
>> should resume once it gets a slot.
>
> this is not implemented in SGE. Once a job was allowed to start, it
> is supposed to run up to its end. I t might be suspended, but it will
> still occupy the granted resources. This is not only a problem of
> SGE, but also of your application: you would have to instruct it to
> release its license temporarily.
>
> You could use a co-scheduler, which would check the waiting and
> running job. When it discovers, that another job should run, it has
> to a) put a running job on hold (to prevent its immediate restart),
> and b) reschedule the job. When there is no waiting job left, the
> waiting (and rescheduled) one could be released and would restart
> again. When I write restart, I mean it in exactly this way: without
> any checkpointing, you job will always restart from the beginning.
>
> -- Reuti
>
>
>> How can I achieve this. Is it possible to write a script and
>> reschedule the job to resume rather than kill the job.
>> I found that using the option Qalter( in QMON GUI) , I could
>> reschedule the job manually, but this is not a solution for systems
>> in real time environment.
>> Is it possible through scripting or is there any other option in
>> the QMON GUI which can solve this problem.
>> Please help me solve this issue.
>>
>> Will be waiting for your reply.
>>
>> Veerendra
>>
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=1
> 11323
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=111626
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=112616

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list