[GE users] Managing the Queue + Nodes

reuti reuti at staff.uni-marburg.de
Wed Nov 4 00:09:55 GMT 2009


Hi Roger,

Am 03.11.2009 um 11:02 schrieb scu98rkr:

> Hi Reuti,
>
> Thanks for your ideas, but I seem to be having a similar problem
> following your instructions as I get when reading the manual.

which chapter is unclearly written? There is also a wiki: http:// 
wikis.sun.com/display/gridengine62u3/Managing+Policies


> In that I
> dont really know what they mean.
>
> First you suggest
>
> qconf -msconf`: max_reservation 25
>
>
> what does this mean what is a reservation is it the amount of jobs  
> queued?

man sched_conf


> qsub -R y ...

man qsub


> Sorry really no idea what does this flag do ?
>
> Also what is a share tree ?

http://www.sun.com/blueprints/0703/817-3179.pdf

man sge_priority

It's one of the three SGE policies. When you want to honor only  
actual usage, you don't need a share tree policy. The functional will  
do, and you can define functional shares also for users.

-- Reuti


> Thanks Roger
>
>
>> Hi,
>>
>> Am 02.11.2009 um 17:12 schrieb scu98rkr:
>>
>>
>>> We have a 64 node cluster consisting of 32 dual node machines.
>>> There are
>>> about 3 users. 2 of the users run single processor jobs that usually
>>> last between 7 hours and 1-2 days and they tend to queue up a  
>>> batch of
>>> test cases ie up 70 jobs each.
>>>
>>> Im running Gaussian and run many different types of jobs 1-2  
>>> hours or
>>> composite calculations several days. Recently I've started running
>>> dual
>>> processor open mp2 jobs. I tend to just run a few jobs at a time.
>>> Although I occasionally will run batches.
>>>
>>> We've never really come to a satisfactory conclusion of how to  
>>> manage
>>> the resources most efficiently. Quite often 1-2 users will not be
>>> using
>>> the node so I want the all of the resources open to every one. I've
>>> set
>>> up the share policy to 33% each so Queued jobs will be order  
>>> according
>>> to how much computing power each user is using on the cluster.
>>> Which is
>>> good but it still means the user with the least jobs has to wait  
>>> till
>>> the previous jobs have finished until their (possibly 1 hour job)
>>> will run.
>>>
>>> Also as I mentioned earlier I've started running dual processor  
>>> jobs.
>>> I've just come back over the weekend to find none of my jobs have  
>>> ran
>>> even after being at the front of the queue because at no point  
>>> have 2
>>> nodes on the same machine been free ( rather unsurprisingly)(I can
>>> pretend the jobs only uses 1 processor but I've noticed if you
>>> specify 2
>>> processors and some one else starts a job on the same machine the
>>> computation time become much slower than if you'd specified 1
>>> processor)
>>>
>>> What I really need SGE to do is monitor the usage of each user
>>> check if
>>> any user is using more than 33% of the cluster. If there are  
>>> currently
>>> any other jobs queued it needs to suspend the user over 33% jobs and
>>> replace them with the queued jobs. SGE doesnt seem to have any  
>>> problem
>>> suspending jobs so can it running other jobs in that suspended  
>>> space.
>>>
>>> I dont want limit peoples access to queues because I want the whole
>>> cluster available to 1 user if there is space.
>>>
>>
>> a) as I read between the lines, you defined a PE with allocation_rule
>> $pe_slots and request two slots. What you need to avoid serial jobs
>> slipping in, is to turn on resource reservation in the scheduler
>> configuration `qconf -msconf`:
>>
>> max_reservation 25
>>
>> (or a more appropriate value)
>>
>> and submit jobs with: qsub -R y ...
>>
>> b) a share tree will honor the past usage, but AFAICS you request
>> only to honor the usage of the cluster right now:
>>
>> http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare-
>> policy
>>
>> -- Reuti
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=224709
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
>
> -- 
> "He made him ride on the high places of the earth, that he might  
> eat the increase of the fields"
> Deuteronomy 32:13
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=224784
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224901

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list