[GE users] Load adjustment (RE: [GE users] how to throttle jobs into a queue)

Kogan, Felix Felix-Kogan at deshaw.com
Wed Aug 29 17:05:52 BST 2007


Load adjustment is a good tool for this purpose, we use it, too. But it
is too indiscriminate. Whereas it helps in the situations like this one,
it is harmful for the PEs. Try to allocate more than one slot per
machine in a PE when load adjustment is enabled!

I think there should be a way to turn load adjustment off for the PEs or
it should be not a global but queue-specific option.

--
Felix

-----Original Message-----
From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] 
Sent: Friday, August 24, 2007 3:59 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] how to throttle jobs into a queue

David,

Actually, you were right the first time.  It does bump the load average 
by the load adjustment amount.  In the email you linked, they're talking

about using load_avg instead of np_load_avg.  no_load_avg = load_avg / 
<num_cpus>, which is where the dividing by the CPUs came from.

In my tests, I did the following:

slots   3
load_thresholds   np_load_avg=1.75
load_adjustments   np_load_avg=4
load_adjustment_decay_time   0:7:30

In 6.0u10 and 6.1u2, submitting three jobs results in one running and 
the other two waiting because the queue is overloaded.  In 6.1, 
submitting three jobs results in all three running.  At least in my 
one-node Solaris Nevada AMD64 test grids.

Daniel

david zanella wrote:
> I'm using 6.1beta (I think). I picked it up just days before the
official 
> release, so I suspect it might be the offical 6.1 version (at least
that's what 
> is says in messages:
>  
> 07/31/2007 09:38:28|schedd|hsrnfs-101|I|starting up N1GE 6.1
(sol-sparc64)
>
> At any rate, I did some more googling and found this:
>
>
http://bioinformatics.org/pipermail/bioclusters/2004-December/002146.htm
l
>
> Which gave me the idea to totally redline np_load_avg. qconf -msconf:
>
> job_load_adjustments              np_load_avg=100
> load_adjustment_decay_time        00:15:00
>
> It works!
>
> qstat -j <jobnum> says:
>
> queue instance "cc32 at crush.mayo.edu" dropped because it is overloaded:
> np_load_avg=2.031982 (= 0.125732 + 100 * 0.610000 with nproc=1) >=
1.75
>
> So, the documentation isn't all that clear. I figured that it "bumped"
> the np_load_avg by the job_load_adjustments amount. Instead, it put
> that value into the np_load_avg equation, THEN divided by the number
of
> CPU's.
>
> Based on your note below, it's very likely that the calculation was
off
> in 6.1 and fixed in u2. I'll get u2 downloaded and see if there is any
> difference in the calculation.
>
> I don't care either way...I've got the cluster to throttle incoming
> jobs, so I'm happy.
>
>
>   
>> I installed a 6.1u2 cluster, and load adjustments appear to work
again.  
>> I also went through the internal issue tracker, and I can't find any 
>> mention of this issue, so it must have been silently (or
accidentally) 
>> fixed with u1.  (u2 is a *very* minor release).  A clean install of
6.1 
>> has the problem, and a clean install of 6.1u2 does not.
>>
>> Daniel
>>
>> Daniel Templeton wrote:
>>     
>>> David,
>>>
>>> Are you using 6.1?  I just tried the same thing with my 6.1 cluster,

>>> and it also had no effect.  I tried the same thing with my 6.0u10 
>>> cluster and it worked.  I'm now downloading the latest 6.1u2
binaries, 
>>> to try it there as well.  I don't see an issue listed for the
problem, 
>>> but it may have been fixed in an update release nonetheless.
>>>
>>> Daniel
>>>
>>> david zanella wrote:
>>>       
>>>> I agree that this will probably work, but it isn't exactly what I"m

>>>> looking for.
>>>> In my case, the users are submitting several thousand jobs at a
time. 
>>>> They cannot predict (or don't want to take the time to) how much 
>>>> memory a job will use. If they flag each job as using 2G of memory,

>>>> then the consumable resource will run out at 15 or 16 jobs. Using
my 
>>>> current load thresholds, I'm getting 22-27 jobs on each server. I 
>>>> lose a lot of throughput if I do this.
>>>> Using qconf -msconf and changing job_load_adjustments from
>>>> np_load_avg=0.5 to np_load_avg=2.0 with a
load_adjustment_decay_time of
>>>> 15 minutes *SHOULD* do it (man sched_conf)...but it doesn't seem to
be
>>>> having any affect. That is, upon each job submission, it should
>>>> artificially increase the np_load_avg to 2.0 (alarm is set at 1.75)
and
>>>> then decay that setting down for 15 minutes. That should give the
job
>>>> enough time to ramp up and start using memory and trip my memory
and
>>>> swap triggers.
>>>>
>>>>
>>>>
>>>> ------------- Begin Forwarded Message -------------
>>>>
>>>> From: "Kogan, Felix" <Felix-Kogan at deshaw.com>
>>>> To: <users at gridengine.sunsource.net>
>>>> Subject: RE: [GE users] how to throttle jobs into a queue
>>>> Content-Transfer-Encoding: 8bit
>>>> X-MIME-Autoconverted: from quoted-printable to 8bit by 
>>>> hsrnfs-101.mayo.edu id l7OG83F27145
>>>>
>>>> I've had the same problem and came up with the following solution
(still
>>>> in testing phase):
>>>>
>>>> o Make mem_free a requestable and consumable attribute
>>>>
>>>>     $ qconf -sc
>>>>     #name                                    shortcut
>>>> type        relop requestable consumable default  urgency
>>>>     
>>>>
#-----------------------------------------------------------------------
>>>>
--------------------------------------------------------------------
>>>>     ...
>>>>     mem_free                                 mf
>>>> MEMORY      <=    YES         YES        0        0
>>>>     ...
>>>>
>>>> o Set the resource value to the real amount of RAM for each node
>>>>  
>>>>     qconf -mattr exechost complex_values mem_free=32G
>>>> hostname.foo.bar.com
>>>>
>>>> Once this is done, users can use "-l mem_free=2G" to really reserve
2GB
>>>> of RAM. Mem_free reading of the host where this job is executed
will
>>>> show 2GB less mem_free. If the job, in fact consumed 2.5GB,
mem_free
>>>> will reflect that. I.e. SGE uses the smaller of two values -
calculated
>>>> from internal accounting and received from the load sensor. This
works
>>>> for all other standard or custom requestable and consumable
attributes,
>>>> as long as custom load sensor is set for these (e.g. you can set
this up
>>>> for /var/tmp space).
>>>>
>>>>
>>>> Hope that helps.
>>>>
>>>> -- 
>>>> Felix Kogan
>>>>
>>>> -----Original Message-----
>>>> From: david zanella [mailto:zanella at mayo.edu] Sent: Friday, August 
>>>> 24, 2007 11:46 AM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: [GE users] how to throttle jobs into a queue
>>>>
>>>>
>>>> I have a group of users that are submitting jobs to my grid.  The
jobs
>>>> do some sort of pedigree/chromosome calculations. It is impossible
for
>>>> the user to predict or control the amount of memory for each job.
>>>> Consequently, some job will start out small and grow to be about 2G
in
>>>> size and run for weeks, other jobs can be small as a few hundred
meg
>>>> and finish up in an hour.
>>>>
>>>> I have set up load thresholds that will suspend job submission if
the
>>>> available mem_free < 2G or swap_used > 6G.  For the most part, this
>>>> works well.  I have 7 T2000's for execute hosts.
>>>>
>>>> Here's the problem:
>>>>
>>>> My T2000's have 32G of memory and I have 30 slots for each. With
the
>>>> load thresholds in place, say the server is only running 20 jobs. A
job
>>>> completes and the server is now below it's load threshold. The
qmaster
>>>> sees this and immediately shoves 11 jobs at the server.  Pretty
soon,
>>>> the jobs grow, I run out of memory and swap, and jobs start
crashing.
>>>>
>>>> What I need is some way to throttle the acceptance rate to the
server.
>>>> To tell the server to accept one job, then re-evaluate in, say, 15
or
>>>> 30 minutes. If the load thresholds give a green light, it'll accept
>>>> another job.
>>>>
>>>> I've looked at sched_conf, and it has what appears to be what I
need.
>>>> I've made various adjustments to job_load_adjustments and
>>>> load_adjustment_decay_time, but these have not had any effect.
>>>>
>>>> Am I missing something? Is there a better way to accomplish what
I'm
>>>> trying to do?
>>>>
>>>>
---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
>>>>
>>>>
>>>> ------------- End Forwarded Message -------------
>>>>
>>>>
---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
>>>>
>>>>   
>>>>         
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list