[GE users] Limit load on NFS server

leinaddm ddm at bartol.udel.edu
Sat May 16 16:58:30 BST 2009

Hi Jesse,

thanks for your reply. You clarified some thoughts I was mulling on and
that helped me implementing them.

I installed sge on the disk servers to have the load_sensor, I installed
the load sensor and set up the complex on the sge master. 
At thet point I modified the load_thresolds and the suspend_thresholds
on the queues and this seems to work. 

At the same time, to limit the maximum number of jobs started in a
scheduler interval I changed the job_load_adjustments and set
np_load_avg=24 and load_adjustment_decay_time to 3 minutes. 
Such a high number seem to be necessary because (at least in the V61u4
we have on the cluster) apparently the np_load_avg seems to be added to
the load_avg before the division by the number of cores. So, for our
nodes with 8 cores, using 24 increases the value of np_load_avg by 3. 
The result is that one job per node is started on the first scheduling
interval, then, since the default load_thresholds for queues is
np_load_avg=1.75, after a minute or so another set of jobs can be
submitted, but by that time the load on the NAS had the time to be
reported back and can, if necessary, prevent the submission of further
jobs using the same disk server.

In principle everything seems to work, but things could be better:
1. it seems that job_load_adjustments can only be specified at the
   scheduler level and not at the queue level. This creates problems
   with another queue on the same cluster used by people that run MPI
   calculations and that do not do much disk access. Having changed the
   job_load_adjustments now prevents the running of MPI jobs that would
   require more than 1 slot per node. I removed the load_threshold
   settings from this queue and this fixed the problem, but still it
   would have been better to be able to switch the job_load_adjustments
   back to the default value for this queue.

2. I think it would be better/more useful to have a parameter (maybe
   per-queue) that specified the maximum number of jobs to start per
   scheduling interval. Even tweaking the job_load_adjustments it
   doesn't seem to be possible to start less than one job per node.

I would appreciate any comments/ideas on how to limit the number of jobs
started in each scheduling interval without using the

Thanks, Daniel.

* hawson <beckerjes at mail.nih.gov> [05/12/2009 15:21]:
> I've been thinking about this as well.  One thought I had was to create a 
> load_sensor, and set load_thresholds on the queues.  I haven't tested this at 
> all, but perhaps something like this:
> For each NFS server, there would be a complex called something like 
> "nfs_load_SERVERNAME".  This would be updated by a load_sensor, most likely 
> running on the NFS servers.  The queues would then set a load threshold for 
> these complexes value, with whatever value is appropriate for your systems. 
> Thus, as the load on the NFS servers rise, the queues would be trip the load 
> thresholds, and no new jobs would be dispatched to the queues.  Unfortunately, 
> I suspect that this may cause some problems with load oscillation, as large 
> numbers are jobs are dispatched all at once.  Perhaps if load_adjustments in 
> the scheduler are used this could be avoided.
> The downside to this is that you need to set these thresholds for all queues 
> you care about.  Of course, you could do clever things by making the complex 
> FORCED, so users would have to request it, and thus be slightly aware of the 
> problem.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list