[GE users] Managing the Queue + Nodes

pvdmeer pvdmeer at gmail.com
Tue Nov 3 16:02:41 GMT 2009


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi!

I haven't tried such a fair share policy as you describe. Maybe this share tree thing will help. What we have running here now are "priority queues", which are pretty neat: first-come-first-served but with the exception that people can still test their stuff or run -very- urgent jobs in between.

For jobs that use parallel code indeed you should set up a PE (and force ppl to use it ;)). This works fine for us.

PS

To my knowledge a share tree is a way to share resources among projects, as opposed to the first-come-first-served approach. see

http://www.globusworld.org/documents/FridayTutorial.pdf

(never used it, though)

On Tue, Nov 3, 2009 at 11:02 AM, scu98rkr <scu98rkr at gmail.com<mailto:scu98rkr at gmail.com>> wrote:
Hi Reuti,

Thanks for your ideas, but I seem to be having a similar problem
following your instructions as I get when reading the manual. In that I
dont really know what they mean.

First you suggest

qconf -msconf`: max_reservation 25


what does this mean what is a reservation is it the amount of jobs queued?

qsub -R y ...


Sorry really no idea what does this flag do ?

Also what is a share tree ?

Thanks Roger


> Hi,
>
> Am 02.11.2009 um 17:12 schrieb scu98rkr:
>
>
>> We have a 64 node cluster consisting of 32 dual node machines.
>> There are
>> about 3 users. 2 of the users run single processor jobs that usually
>> last between 7 hours and 1-2 days and they tend to queue up a batch of
>> test cases ie up 70 jobs each.
>>
>> Im running Gaussian and run many different types of jobs 1-2 hours or
>> composite calculations several days. Recently I've started running
>> dual
>> processor open mp2 jobs. I tend to just run a few jobs at a time.
>> Although I occasionally will run batches.
>>
>> We've never really come to a satisfactory conclusion of how to manage
>> the resources most efficiently. Quite often 1-2 users will not be
>> using
>> the node so I want the all of the resources open to every one. I've
>> set
>> up the share policy to 33% each so Queued jobs will be order according
>> to how much computing power each user is using on the cluster.
>> Which is
>> good but it still means the user with the least jobs has to wait till
>> the previous jobs have finished until their (possibly 1 hour job)
>> will run.
>>
>> Also as I mentioned earlier I've started running dual processor jobs.
>> I've just come back over the weekend to find none of my jobs have ran
>> even after being at the front of the queue because at no point have 2
>> nodes on the same machine been free ( rather unsurprisingly)(I can
>> pretend the jobs only uses 1 processor but I've noticed if you
>> specify 2
>> processors and some one else starts a job on the same machine the
>> computation time become much slower than if you'd specified 1
>> processor)
>>
>> What I really need SGE to do is monitor the usage of each user
>> check if
>> any user is using more than 33% of the cluster. If there are currently
>> any other jobs queued it needs to suspend the user over 33% jobs and
>> replace them with the queued jobs. SGE doesnt seem to have any problem
>> suspending jobs so can it running other jobs in that suspended space.
>>
>> I dont want limit peoples access to queues because I want the whole
>> cluster available to 1 user if there is space.
>>
>
> a) as I read between the lines, you defined a PE with allocation_rule
> $pe_slots and request two slots. What you need to avoid serial jobs
> slipping in, is to turn on resource reservation in the scheduler
> configuration `qconf -msconf`:
>
> max_reservation 25
>
> (or a more appropriate value)
>
> and submit jobs with: qsub -R y ...
>
> b) a share tree will honor the past usage, but AFAICS you request
> only to honor the usage of the cluster right now:
>
> http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare-
> policy
>
> -- Reuti
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224709
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].
>


--
"He made him ride on the high places of the earth, that he might eat the increase of the fields"
Deuteronomy 32:13

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224784

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list