[GE users] virtual memory

reuti reuti at staff.uni-marburg.de
Tue Nov 2 19:34:05 GMT 2010


Am 02.11.2010 um 20:18 schrieb bug:

> I have to give some background to get to my questions at the bottom...
> I have a cluster of CentOS Linux machines, all with 8 cores, 8G of
> memory and 2G of swap.  Exec nodes were crashing because of memory
> exhaustion.  This has prompted me to implement hard memory limits and
> make h_vmem consumable.  The main user application is Matlab Distributed
> Computing Environment.
> I set exec node attributes, in a for loop:
> # qconf -rattr exechost complex_values slots=8,virtual_free=10G node01
> I modify complex configuration setting h_vmem consumable to YES:
> # qconf -mc
> I set the default options for jobs in the
> $SGE_ROOT/default/common/sge_request file:
> -l h_vmem=2g
> -l h_stack=128m
> Note that if you do not set h_stack, Matlab and Python will refuse to start.
> Users can still request whatever size they want, larger or smaller:
> $ qsub -l h_vmem=4g -l h_stack=256m myjob.sh
> I also enable job info by setting schedd_job_info to true:
> # qconf -msconf
> Now users can see how much memory their jobs are actually using, and get
> some accounting info during or after the fact:
> $ qstat -j $JOBNUM
> $ qacct -j $JOBNUM
> So, now I have a setup that will kill runaway jobs before they kill the
> exec node.
> If I look on the node a job is running, I see:
> 6560 myuser  17  0 1477m 226m  45m S 99.8  2.8 14:39.41  MATLAB

AFAIK these 1477m can also be the result of a malloc, where not all reserved memory was accessed yet. Only when the reserved memory is actually filled with data, it will show up as being used.

> The process is only using 226m of physical memory.  Yes, the virtual
> memory allocation is 1477m, but I assume that most of that is on disk or
> dynamic libraries.
> Why is the virtual so high?  Am I missing something?  Shouldn't the hard
> limit be on the actual physical memory usage, not the virtual?  Is vmem
> the only predictable metric for the memory footprint of a job instance?
> Is there a hard limit we can set on physical RAM, not virtual?

You made h_vmem consumable (and AFAICS no limit was set on an execution host level). Instead of setting "virtual_free" therein (and it's never requested), I would suggest to define "h_vmem" there. But unless you define some arbitrary high value, it won't change the behavior.

SGE can't know whether the actual usage of the memory is static for an application or will increase up the the requested value over the lifetime of a job. So it can only judge the reserved memory as being really reserved and can't grant it to other jobs.

> If we run five of these jobs, a node is full, but there is still free
> cores and free physical ram.  This is not optimal.  How are others
> reigning in their job memory usage effectively and still using the
> system to the fullest?

Well: better forecast about the necessary amount of memory for each job?

-- Reuti

> Cheers,
> -- 
> Gavin W. Burris
> Senior Systems Programmer
> Information Security and Unix Systems
> School of Arts and Sciences
> University of Pennsylvania
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=292240
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list