[GE users] Allocating CPUs team wise

Marconnet, James E Mr /Computer Sciences Corporation james.marconnet at smdc.army.mil
Wed Mar 16 19:32:08 GMT 2005


Thanks for sharing, especially the link to some of your work shared on the
web. I worked up the directory structure a little and read thru your
interesting article: "Building and managing production bioclusters".  Deep
within it I found something that echoed my recent sentiments exactly: 

"There is relatively little information available in the SGE product
documentation or published materials that covers advanced Grid Engine
deployment, configuration and support strategies. Managing very large and
complex deployments seems to be a bit of a 'black art' that only a few have
really mastered. For novice cluster operators, attempting to manage or
install a sophisticated Grid Engine configuration can be an adventure in
both terminology and technology as layers of 'subordinate queues' and
interlocking chains of 'policies', 'complexes', 'ticket tokens' and 'load
sensors' become bewildering very quickly."

The documentation I've seen so far tells one mostly about what the various
switches and options mean, a little on HOW to do something, but it contains
very little on WHY, much less containing a suggested logic path for
determining what makes sense to do SGE administration-wise. 

Fortunately I have someone else who did our SGE installation and it works.
But then they faded away and have not been available much since. Good to
have emergency backup, but I feel really lonely/unsupported in the day to
day operations. And hiring an expert is out of the question.

In trying to support our users and at to get the technical work accomplished
using the cluster we have, we have come up with a dizzying set of group
"agreements" like a 4-hour wall clock runtime limit on certain ques, and
special priority assignment of certain nodes (moving them from que to que)
to help meet certain deadlines; and have in the process set up multiple ques
for our 3 different groups with both "primary" and "secondary" que node
allocations for each group. Plus a special que in one group for people who
want to run 4 jobs per Penguin node instead of the general 3. May yet set up
different ques for different subgroups with more/less priority. Just
starting to put in the subordinate ques so we don't get so many jobs running
on each node because users from different groups used different group ques
to submit their jobs - that I wonder: "Is this really the way we should go?"

So far I have no good basic ques approach/setup guidance other than what
I've found archived here and that which I have received in response to my
recent questions here in this group (thanks!). I keep thinking that someone
here will suggest a good book, but that has not happened so far. SGE
Administration for Dummies??

Again, thanks for sharing some of what you have learned. Looking forward to
piecing things together over some time, a tidbit at a time, ques-wise.

Jim Marconnet

-----Original Message-----
From: Chris Dagdigian [mailto:dag at sonsorol.org] 
Sent: Wednesday, March 16, 2005 5:07 AM
To: users at gridengine.sunsource.net; nr at fluent.co.in
Subject: Re: [GE users] Allocating CPUs team wise

Hi Nitin,

SGE 6 has excellent mechanisms for doing resource allocation, especially
when it comes to groups of users, departments or projects.

There are a number of ways you can get what you are requesting - for
instance if you really wanted to be strict about which CPUs each of your
  team members uses, you could take advantage of then Access Control Lists
and XACLs that Grid Engine offers. This would involve:

o Configuring SGE to be aware of each user and making a usergroup object for
each "team". Then you could use SGE's ACL and XACL mechanisms to control at
the queue level who is allowed to run on each queue instance. 
You could also do this with SGE Department or SGE Project objects.

All this is well covered in the SGE documentation collection:

This would get you strictly what you want but is not extremely flexible and
would require effort on your part to constantly adjust and move around the
ACLs to reflect current needs.

A more "flexible" approach may be to use the SGE Sharetree or Functional
Share policies to do roughly the same thing. Using functional shares it is
easy to set up policies that do something like this:

  o when the cluster is idle; any team can run on any CPU
  o when cluster is busy; Team A gets 30% of cluster resources
  o when cluster is busy; Team B gets 70% of cluster resources

The sharetree or functional policy may allow you to allocate cluster
resources in a way that is more efficient and easier for you to manage on a
day-to-day basis. It is certainly less "work" for an admin than having to
constantly manipulate access control lists.

The hard part about the functional/share-tree policy is actually explaining
the system to end users, management and "stakeholders". In my experience
people always want constant assurance that they are not being "cheated" out
of their rightful share of compute power. ACLs and how they behave are
easily monitored by end users; the resource allocation policies are a bit
more hidden as they operate within the scheduler and work only to adjust the
priority of jobs sitting in the pending list.

It's not totally appropriate to your situation but in the past when I had to
divide a cluster up for usage among different departments I took careful
notes and threw it up on the web. There may be useful information there:


Nitin Raina wrote:
> Hello All,
> We are using SGE 6.0 for submitting serial and parallel jobs across 
> our cluster. This cluster is used by various teams .We have two queues 
> created which has set of users bound with it who can run jobs on the 
> respective queues.
> Further to this arrangement we are thinking of a way by which we can 
> merge both the queues and in return create a arrangement where we can 
> allot CPUs team wise.
> We would like to have let say 10 CPUs dedicated to Team A, 8 CPUS 
> dedicated to Team B and so on..
> Is that possible? If yes then can we have this arrangement in a 
> dynamic way by which I can reduce and increase the count of CPUs as 
> per the need arises.
> Any help would be much appreciated.

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list