[GE users] load/suspend Thresholds problem

Rene Salmon rsalmon at tulane.edu
Thu Sep 23 14:51:53 BST 2004




Thanks that fixed the problem now the host stops accepting jobs 
once the load it to big.

Rene


On Wed, Sep 22, 2004 at 03:45:28PM -0700, Beadles, Jeff wrote:
> 
> 
> np_load_avg is normalized by the number of processors.  Thus, 1.75 on a two processor system means that the load has to be over 3.5 to stop accepting new jobs.
> 
> It sounds like you want np_load_avg to be set to something like 0.875  (1.75 / 2)
> 
> Regards,
> 	-Jeff
> 
> 
> -----Original Message-----
> From: Rene Salmon [mailto:rsalmon at tulane.edu] 
> Sent: Wednesday, September 22, 2004 10:41 AM
> To: users at gridengine.sunsource.net
> Subject: [GE users] load/suspend Thresholds problem
> 
> 
> Hi,
> 
> I am running SGE 6.0u1 on AMD64.  Here is the setup:
> I have two dual processor machines compute-0-0 and crash 
> both are execution hosts.
> I have two cluster queues "all.q" and "qtest"
> and two queue instances "compute-0-0.local" and "crash.local".
> 
> This is what it lookslike:
> all.q at compute-0-0.local
> all.q at crash.local
> qtest at compute-0-0.local
> qtest at crash.local
> 
> 
> both cluster queues and instance queues have 
> 
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> 
> 
> But for some reason the hosts do not stop accepting jobs once the threshold
> is reached. 
> 
> 
> The load on the each hosts is about 3.64 and each host is running 
> about 3 jobs but they still keep accepting more jobs.  The hosts do
> not start rejecting jobs even after the system load is above 1.75.
> 
> >qstat
> 
> job-ID  prior   name       user         state submit/start at     queue
> slots ja-task-ID 
> -----------------------------------------------------------------------------------------------------------------
>      25 0.56000 MyJob      rsalmon      r     09/22/2004 12:08:37
>      all.q at compute-0-0.local            1        
>      26 0.56000 MyJob      rsalmon      r     09/22/2004 12:08:37
>      all.q at compute-0-0.local            1        
>      24 0.56000 MyJob      rsalmon      r     09/22/2004 12:08:37
>      all.q at crash.local                  1        
>      27 0.56000 MyJob      rsalmon      r     09/22/2004 12:08:37
>      all.q at crash.local                  1        
>      29 0.56000 MyJob      rsalmon      r     09/22/2004 12:08:37
>      qtest at compute-0-0.local            1        
>      28 0.56000 MyJob      rsalmon      r     09/22/2004 12:08:37
>      qtest at crash.local                  1        
>      30 0.56000 MyJob      rsalmon      r     09/22/2004 12:10:52
>      qtest at crash.local                  1        
> 
> 
> >qhost
> 
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
> SWAPUS
> -------------------------------------------------------------------------------
> global                  -               -     -       -       -       -
> -
> compute-0-0             lx24-amd64      2  1.52    1.9G  174.1M  996.2M
> 0.0
> crash                   lx24-amd64      2  1.69    1.9G  262.8M    5.3G
> 0.0
> 
> 
> qhost reports a load of about 1.69  but the actual load on the system 
> is 3.64 (from uptime).  Any ideas?
> 
> 
> This was working fine when I only had one cluster queue "all.q"
> after I added the second cluster queue "qtest" then the problem 
> started.
> 
> Thank you for any help
> Rene
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

-- 
-
--
	Rene Salmon
	Tulane University
	Center for Computational Science
	Richardson Building 310
	New Orleans, LA 70118
	http://www.ccs.tulane.edu
	Tel 504-862-8393
	Fax 504-862-8392


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list