[GE users] job suspension based on queue priority

jigar_halani jigar at talentain.com
Sun Nov 9 10:48:37 GMT 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Yes that will be fine, But I can re-queue these jobs? Is there any method via which we can do so?
As they are very few jobs which will be exceeding the time limit and they are very small. And users will be informed by e-mail so they can either delete the job or can run in medium queue.

Regards,
Jigar Halani

PS: it bounced back so copying reuti as well.

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Friday, November 07, 2008 4:13 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] job suspension based on queue priority

Hi,

Am 07.11.2008 um 06:35 schrieb Jigar Halani:

> Hi Reuti,
>
> Thanks for the reply. Can I re-queue these suspended jobs?

when you requeue the jobs, they will start from the beginning. Except  
your application supports checkpointing, then it could be put it in  
the waiting state again by a suspend automatically.

-- Reuti


> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Thursday, November 06, 2008 4:20 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] job suspension based on queue priority
>
> Am 06.11.2008 um 07:52 schrieb Jigar Halani:
>
>> Hi Reuti,
>>
>> Thanks for the reply. Please find the comments below.
>>
>> --
>> Thanks and regards,
>> Jigar Halani
>>
>> -----Original Message-----
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Wednesday, November 05, 2008 8:29 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] job suspension based on queue priority
>>
>> Hi,
>>
>> Am 05.11.2008 um 15:42 schrieb Jigar Halani:
>>
>>> I am facing some issue in my environment, below are the details for
>>> the same. Appreciate for the help.
>>>
>>> I have total 100 system with 50 configured under small.q Queue
>>> (small jobs will run on this system), 30 system are configured
>>> under medium.q  (medium jobs will run on this system) and I have 20
>>> systems configured under big.q (large jobs will run on this
>>> system). I am using SGE 6.1u5.
>>
>> you mean: every system has only on queue on it?
>>
>> Yes all the system will have only one Queue on it.
>>
>>>
>>> ·         I have to configure queues based on time limit, e.g.
>>> small.q will only allow jobs to be execute for 1 minute. I have set
>>> CPU limit for 1 minute, it works fine but that kills the job but
>>> user wants job to be suspended. Is there any option to set the same?
>>
>> You can suspend jobs, but when there is only one queue on it, it
>> makes hardly sense. What would be your condition to unsuspend the job
>> again?
>>
>> Yes each system will have only one Queue on it. But this will help
>> me getting license on time and avoid suspending infinite jobs. As
>> on this Queue users are only going to test the job with time limit.
>> Now to resume / Un-suspend that job users have to manually click
>> the button / run the resume command for the same.
>
> If you do this, you will have "blocked/idling" slots AFAICS. You
> suspend a job and it still blocks a slot in the cluster.
>
> IMO it's higly unhandy, that the owner of the suspended job has to
> look into resuming his job again, after it was suspended for someone
> else's job.
>
> BTW: I would tend more to allow every job to run everywhere and the
> total amount of slots per queue-type could be limited with an RQS.
> Just to spread the load. But this is a different point and not
> sloving your main issue.
>
>>> ·         Second i am also not getting any clue on how to configure
>>> job suspension based on the Queue priority. E.g. Let?s say I have
>>> 10 licenses of a application-A and I am currently running 10 jobs
>>> in big.q, now one user submit?s a job in small.q, so the SGE
>>> should suspend the last job submitted in the big.q and occupy the
>>> license for the job submitted in small.q.
>>
>> This is not possible in SGE without the help of a co-scheduler. As
>> all licenses are used up, SGE wouldn't schedule any additonal job.
>> Especially, as also suspended jobs still use the resources. The co-
>> scheduler would need to suspend a job (which also must agree to give
>> the license back) and adjust the license count in SGE.
>>
>> Can you please suggest some good open-source co-scheduler for the
>> same?
>
> The idea of a co-scheduler for special applications is from time to
> time on the list. As they are highly specialized for exactly one
> purpose, I'm not aware of any open source all-in-one solution.
>
> Maybe some list members could post their solutions, if they had to
> setup a similar scheduling.
>
> -- Reuti
>
>
>> -- Reuti
>>
>>
>>> Once the job is over (as the job will not run for more then 1
>>> minute users are OK by suspending big.q jobs) the resource
>>> (license) will be given back to the suspended job. I have gone
>>> through thehttp://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=72816 thread but have not got the solution.
>>>
>>> Awaiting for your reply and thanks in advance.
>>>
>>> --
>>> Thanks and regards,
>>> Jigar Halani
>>> Talentain Technologies Pvt. Ltd.
>>>
>>>
>>> __________ Information from ESET Smart Security, version of virus
>>> signature database 3586 (20081105) __________
>>>
>>> The message was checked by ESET Smart Security.
>>>
>>> http://www.eset.com
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=88108
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>>
>> __________ Information from ESET Smart Security, version of virus
>> signature database 3586 (20081105) __________
>>
>> The message was checked by ESET Smart Security.
>>
>> http://www.eset.com
>>
>>
>>
>> __________ Information from ESET Smart Security, version of virus
>> signature database 3589 (20081106) __________
>>
>> The message was checked by ESET Smart Security.
>>
>> http://www.eset.com
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=88159
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88188
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus  
> signature database 3591 (20081106) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus  
> signature database 3591 (20081106) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88262
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88275

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88355

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list