[GE users] user loads

Mag Gam magawake at gmail.com
Tue Sep 16 02:02:24 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Chris:

Thank you very much for this advice. This is good advice. I have been
playing around with SGE for about 2 weeks now. I also played around
with other 'engines' and schedulers. This is by far the best and the
community is extremely friendly and willing to help.

I agree with your "perfect" configuration scenario, I don't think one
exists.  I have been reading a lot of docs, especially
http://docs.sun.com/app/docs/doc/817-5677/6ml49n2bt?a=view about fair
scheduling and its utterly confusing.  The pictures are intimidating
therefore I prefer actual commands to better understand 'under the
hood'.

 My users are able to execute jobs with no problems, its just a matter
of setting up everything correctly. I like the alarm state feature you
mentioned, but unfortunately the link you provided isn't much use for
me. I keep seeing a message post, but no guide or documentation.
Perhaps you mistakenly posted the wrong link?


Also, I am using the binary 6.1 version. I will move to 6.2 if needed.
Overall, 6.1 seems very nice to work with.

TIA
P.S enjoy your business travel :-)




On Mon, Sep 15, 2008 at 8:33 PM, Chris Dagdigian <dag at sonsorol.org> wrote:
>
> I think that what you are looking for is already automatically found in Grid
> Engine. Read the SGE documentation and queue configuration guides and look
> particularly for "load alarm threshold" ... the basic idea is that SGE has a
> built-in protection mechanism for over-working the compute nodes. The nodes
> periodically report their load averages and if the value exceeds a certain
> threshold the queue instance trips into 'alarm' state and will not take new
> work even if there are job slots free. The nice thing about this feature is
> that the load alarm clears automatically when the load drops on the remote
> node.
>
> Even if you have no resource quotas and no resource allocation policies in
> effect your system will still be protected from overload by this system.
>
> As someone who regularly builds SGE systems and implements policies for
> others I've got some advice for you ...
>
> - Don't make the mistake of trying to get your SGE configuration "perfect"
> the first time around. This is generally impossible -- you'll never really
> know the finer points of how your system should be configured and tuned
> until some time after you have turned real users loose and started doing
> real work on the system
>
> My rule of thumb for new SGE projects goes like this:
>
> 1. Collect requirements from IT admins and end-users, "translating" SGE
> capabilities if needed. If they can't describe what the cluster should do
> for them or can't really understand SGE without using it then I will usually
> deploy SGE in the default mode with a simple "fairshare by user" policy.
> This is a nice simple way to expose SGE to users and everyone understands
> and appreciates fairshare-by-user when you tell them "the scheduler will
> work to make sure everyone gets a fair and equal share of available
> resources"
>
> 2. Implement a "best guess" SGE configuration, open up the cluster for users
> during this "beta" or "trial" period
>
> 3. After a few weeks or a month or so, go back and talk to users and
> operators and see what the like and (most importantly) don't like about the
> system
>
> 4. Based on feedback from step #3 start refining your configuration, making
> policy changes or adding resource quotas as needed
>
>
> I have not had time to read all of your most recent messages (business
> travel) but if you are really deploying this system for the first time I
> would not try to get too tricky and/or complicated right at the beginning.
>
> You would be well served by:
>
> - Installing SGE with most of the default options enabled
> - Installing a simple "fairshare by user" policy
> (http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare-policy )
> - If you are using SGE 6.2 you should re-enable the schedd_job_info
> parameter
> (http://article.gmane.org/gmane.comp.clustering.gridengine.users/11768)
>
> Once you have the basics up and running you can experiment with queue
> settings and resource quotas as needed but I'd recommend just running in a
> basic mode for as long as it takes for you to be able to characterize your
> requirements and your particular application workflow.
>
>
>
> Regards,
> Chris
>
>
>
>
>
>
> On Sep 15, 2008, at 8:14 PM, Mag Gam wrote:
>
>> Hello All,
>>
>> As many of you know we are putting together a GRID at my university's
>> engineering lab. I wanted to know if we can throttle a user's job
>> depending on the load of the system. Lets say I have 16 servers and I
>> would like to submit a job.Each of these servers are a exec hosts.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list