[GE users] Reserving memory question.

bdbaddog bill at baddogconsulting.com
Wed Jan 20 23:18:36 GMT 2010


Greetings,

On Sat, Dec 5, 2009 at 2:00 AM, mlelstv <mlelstv at serpens.de> wrote:
> On Fri, Dec 04, 2009 at 11:40:57PM -0800, gutnik wrote:
>
>> Great. That ... seems to do just what I wanted. Why is it necessary
>> to specify the complex? SGE clearly already knows how much memory
>> is available.
>
> There is a load value "mem_free" which happens to represent the currently
> free memory on a specific host.
>
> When you ask for a resource "mem_free" then this is checked
> against the load value.
>
>
> There might also be a complex variable with the same name "mem_free"
> configured for a host.
>
> If you set this variable then the resource request for "mem_free"
> is checked against the minimum of the load value and the complex
> variable.
>
> Since the complex variable is also defined as consumable its value
> is reduced by the requested value as long as the specific job is
> running.
>
> Example:
>
>    qconf -se HOST
>
> shows you the original (configured) value of the complex
> variable "mem_free" and the load value "mem_free". E.g.
>
> hostname              node12345
> load_scaling          NONE
> complex_values        slots=5,mem_free=15.7G
> load_values           arch=lx24-amd64,num_proc=4,mem_total=16052.765625M, \
>                      swap_total=2055.179688M,virtual_total=18107.945312M, \
>                      load_avg=0.000000,load_short=0.000000, \
>                                          load_medium=0.000000,load_long=0.000000, \
>                                          mem_free=15895.773438M,swap_free=2042.644531M, \
>
> and
>    qhost -F mem_free HOST
>
> shows you the computed minimum of the load value and the
> reduced complex variable:
>
> node12345               lx24-amd64      4  0.00   15.7G  158.0M    2.0G   12.5M
>    Host Resource(s):      hl:mem_free=15.521G
>
> When I start a job with a request like
>    echo "sleep 300" | qsub -l mem_free=8G -j y -o /dev/null
> then the computed minimum changes:
>
> node12345               lx24-amd64      4  0.00   15.7G  158.1M    2.0G   12.5M
>    Host Resource(s):      hc:mem_free=7.700G
>
> to the configured value - requested value. And when the job completes
> it changes back again:
>
> node12345               lx24-amd64      4  0.00   15.7G  157.5M    2.0G   12.5M
>    Host Resource(s):      hl:mem_free=15.523G
>
> You see that the reported Host Resource is smaller than the configured
> value because some memory is used by the system and the load value
> is therefore smaller.
>
> I don't think there is a way to report the reduce value of the the complex
> alone. But then nobody cares about this value, only the reported minimum is
> used as the 'Host Resource' and is compared to the requested resource.
>
> Some people might configure the complex variable "mem_free" to be
> slightly less than "mem_total". As you can see from the example
> there is some fuzz caused by memory used by the system that could
> be accounted by such a configuration.
>
>
> Now, why does SGE not automatically provide a complex variable
> and initialize it this way. Probably because it knows that
> a load value and a complex variable of the same name should be
> treated this way.
> But it doesn't know that "mem_free" and "mem_total" are related
> and as described above, some people prefer slightly different
> values.


I've tried the above on 6.2u4, but I'm not seeing what you describe:
$ qconf -se pm-1-1
hostname              pm-1-1.ta.com
load_scaling          NONE
complex_values        mem_free=7.8G
load_values           load_avg=2.490000,load_short=1.430000, \
                      load_medium=2.490000,load_long=3.260000,arch=lx24-amd64, \
                      num_proc=4,mem_free=7519.761719M, \
                      swap_free=63962.347656M,virtual_free=71482.109375M, \
                      mem_total=7982.730469M,swap_total=64001.128906M, \
                      virtual_total=71983.859375M,mem_used=462.968750M, \
                      swap_used=38.781250M,virtual_used=501.750000M, \
                      cpu=25.000000,np_load_avg=0.622500, \
                      np_load_short=0.357500,np_load_medium=0.622500, \
                      np_load_long=0.815000
processors            4
user_lists            NONE
xuser_lists           NONE
projects              NONE
xprojects             NONE
usage_scaling         NONE
report_variables      NONE
$ qhost -F mem_free pm-1-1
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
pm-1-1                  lx24-amd64      4  2.49    7.8G  463.0M   62.5G   38.8M
    Host Resource(s):      hl:mem_free=7.344G

$ echo "sleep 300" | qsub -l mem_free=7G -j y -o /dev/null
Your job 644 ("STDIN") has been submitted

$ qhost -F mem_free pm-1-1
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
pm-1-1                  lx24-amd64      4  2.16    7.8G  577.1M   62.5G   38.8M
    Host Resource(s):      hl:mem_free=7.232G

I'm not seeing any reduction in pm-1-1's mem_free.
What am I missing?
Do I need 6.2u5 for this?

Thanks,
Bill

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=240017

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list