[GE users] Managing the Queue + Nodes

scu98rkr scu98rkr at gmail.com
Tue Nov 3 10:02:07 GMT 2009


Hi Reuti,

Thanks for your ideas, but I seem to be having a similar problem
following your instructions as I get when reading the manual. In that I
dont really know what they mean.

First you suggest

qconf -msconf`: max_reservation 25


what does this mean what is a reservation is it the amount of jobs queued?

qsub -R y ...


Sorry really no idea what does this flag do ?

Also what is a share tree ?

Thanks Roger


> Hi,
>
> Am 02.11.2009 um 17:12 schrieb scu98rkr:
>
>   
>> We have a 64 node cluster consisting of 32 dual node machines.  
>> There are
>> about 3 users. 2 of the users run single processor jobs that usually
>> last between 7 hours and 1-2 days and they tend to queue up a batch of
>> test cases ie up 70 jobs each.
>>
>> Im running Gaussian and run many different types of jobs 1-2 hours or
>> composite calculations several days. Recently I've started running  
>> dual
>> processor open mp2 jobs. I tend to just run a few jobs at a time.
>> Although I occasionally will run batches.
>>
>> We've never really come to a satisfactory conclusion of how to manage
>> the resources most efficiently. Quite often 1-2 users will not be  
>> using
>> the node so I want the all of the resources open to every one. I've  
>> set
>> up the share policy to 33% each so Queued jobs will be order according
>> to how much computing power each user is using on the cluster.  
>> Which is
>> good but it still means the user with the least jobs has to wait till
>> the previous jobs have finished until their (possibly 1 hour job)  
>> will run.
>>
>> Also as I mentioned earlier I've started running dual processor jobs.
>> I've just come back over the weekend to find none of my jobs have ran
>> even after being at the front of the queue because at no point have 2
>> nodes on the same machine been free ( rather unsurprisingly)(I can
>> pretend the jobs only uses 1 processor but I've noticed if you  
>> specify 2
>> processors and some one else starts a job on the same machine the
>> computation time become much slower than if you'd specified 1  
>> processor)
>>
>> What I really need SGE to do is monitor the usage of each user  
>> check if
>> any user is using more than 33% of the cluster. If there are currently
>> any other jobs queued it needs to suspend the user over 33% jobs and
>> replace them with the queued jobs. SGE doesnt seem to have any problem
>> suspending jobs so can it running other jobs in that suspended space.
>>
>> I dont want limit peoples access to queues because I want the whole
>> cluster available to 1 user if there is space.
>>     
>
> a) as I read between the lines, you defined a PE with allocation_rule  
> $pe_slots and request two slots. What you need to avoid serial jobs  
> slipping in, is to turn on resource reservation in the scheduler  
> configuration `qconf -msconf`:
>
> max_reservation 25
>
> (or a more appropriate value)
>
> and submit jobs with: qsub -R y ...
>
> b) a share tree will honor the past usage, but AFAICS you request  
> only to honor the usage of the cluster right now:
>
> http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare- 
> policy
>
> -- Reuti
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224709
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>   


-- 
"He made him ride on the high places of the earth, that he might eat the increase of the fields"
Deuteronomy 32:13

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224784

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list