[GE users] Managing the Queue + Nodes
scu98rkr at gmail.com
Tue Nov 3 10:02:07 GMT 2009
Thanks for your ideas, but I seem to be having a similar problem
following your instructions as I get when reading the manual. In that I
dont really know what they mean.
First you suggest
qconf -msconf`: max_reservation 25
what does this mean what is a reservation is it the amount of jobs queued?
qsub -R y ...
Sorry really no idea what does this flag do ?
Also what is a share tree ?
> Am 02.11.2009 um 17:12 schrieb scu98rkr:
>> We have a 64 node cluster consisting of 32 dual node machines.
>> There are
>> about 3 users. 2 of the users run single processor jobs that usually
>> last between 7 hours and 1-2 days and they tend to queue up a batch of
>> test cases ie up 70 jobs each.
>> Im running Gaussian and run many different types of jobs 1-2 hours or
>> composite calculations several days. Recently I've started running
>> processor open mp2 jobs. I tend to just run a few jobs at a time.
>> Although I occasionally will run batches.
>> We've never really come to a satisfactory conclusion of how to manage
>> the resources most efficiently. Quite often 1-2 users will not be
>> the node so I want the all of the resources open to every one. I've
>> up the share policy to 33% each so Queued jobs will be order according
>> to how much computing power each user is using on the cluster.
>> Which is
>> good but it still means the user with the least jobs has to wait till
>> the previous jobs have finished until their (possibly 1 hour job)
>> will run.
>> Also as I mentioned earlier I've started running dual processor jobs.
>> I've just come back over the weekend to find none of my jobs have ran
>> even after being at the front of the queue because at no point have 2
>> nodes on the same machine been free ( rather unsurprisingly)(I can
>> pretend the jobs only uses 1 processor but I've noticed if you
>> specify 2
>> processors and some one else starts a job on the same machine the
>> computation time become much slower than if you'd specified 1
>> What I really need SGE to do is monitor the usage of each user
>> check if
>> any user is using more than 33% of the cluster. If there are currently
>> any other jobs queued it needs to suspend the user over 33% jobs and
>> replace them with the queued jobs. SGE doesnt seem to have any problem
>> suspending jobs so can it running other jobs in that suspended space.
>> I dont want limit peoples access to queues because I want the whole
>> cluster available to 1 user if there is space.
> a) as I read between the lines, you defined a PE with allocation_rule
> $pe_slots and request two slots. What you need to avoid serial jobs
> slipping in, is to turn on resource reservation in the scheduler
> configuration `qconf -msconf`:
> max_reservation 25
> (or a more appropriate value)
> and submit jobs with: qsub -R y ...
> b) a share tree will honor the past usage, but AFAICS you request
> only to honor the usage of the cluster right now:
> -- Reuti
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
"He made him ride on the high places of the earth, that he might eat the increase of the fields"
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users