[GE users] memory / cpu load threshold

Reuti reuti at staff.uni-marburg.de
Thu Jul 22 23:20:19 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

>We have a SGE_CELL of 15 machines running SGE6.0. All the machines are
>dual CPU but with different amount of memory and swap running under
>Linux. I would like to set a "load threshold" for memory and swap in
>terms of % free not the amount as all the machines have different amount
>of Memory and Swap.. For an e.g. if a machine has 4G of RAM and 4G of
>swap and only 1G of memory is available because of some other job running
>on the machine then SGE should not use this machine for running a job. I
>can do this by setting up memory threshold by specifying the amount of
>memory but I would like to do by % wise, say if memory available is 25%
>then do not submit new jobs in this machine by using Qmon, Load
>threshold. Is there a way to do this ? 

You could write your own load sensor, which is doing the calculation in every 
way you like. Then you could use this returned values.

>The other question is regarding CPU utilization. As all the machines have
>2 CPU, I have defined 2 slots for each machine so 2 jobs can run
>simultaneously on a machine. I have defined a load threshold
>"np_load_avg" to value 0.7. Sometimes if a user is running a job on one
>of the machine without using SGE and just uses one CPU to 80%. In this if
>other user submits a job asking for 4 CPU using SGE then SGE treats this
>machine as a  good resource and submits two process on this machine and
>other two on other machine. Ideally it should have used one CPU from the
>first machine. Eventually all the users will be using SGE to submit the
>jobs, I was just wondering if there is way I can define that if one of
>the CPU is busy (because of a Non SGE submitted job) then just use one
>slot from that machine. 

"np_load_avg" to value 0.7 means in total 1.4 on dual machines. You could use 
an absolute value "load_avg" instead of the average in respect to the number of 
CPUs installed. On the other hand: what is specified in your parallel 
environment? There you can select $round_robin instead of $fill_up, to 
distribute the parallel jobs in an other way.

Szia - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list