[GE users] virtual memory
bug at sas.upenn.edu
Tue Nov 2 19:18:17 GMT 2010
I have to give some background to get to my questions at the bottom...
I have a cluster of CentOS Linux machines, all with 8 cores, 8G of
memory and 2G of swap. Exec nodes were crashing because of memory
exhaustion. This has prompted me to implement hard memory limits and
make h_vmem consumable. The main user application is Matlab Distributed
I set exec node attributes, in a for loop:
# qconf -rattr exechost complex_values slots=8,virtual_free=10G node01
I modify complex configuration setting h_vmem consumable to YES:
# qconf -mc
I set the default options for jobs in the
Note that if you do not set h_stack, Matlab and Python will refuse to start.
Users can still request whatever size they want, larger or smaller:
$ qsub -l h_vmem=4g -l h_stack=256m myjob.sh
I also enable job info by setting schedd_job_info to true:
# qconf -msconf
Now users can see how much memory their jobs are actually using, and get
some accounting info during or after the fact:
$ qstat -j $JOBNUM
$ qacct -j $JOBNUM
So, now I have a setup that will kill runaway jobs before they kill the
If I look on the node a job is running, I see:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6560 myuser 17 0 1477m 226m 45m S 99.8 2.8 14:39.41 MATLAB
The process is only using 226m of physical memory. Yes, the virtual
memory allocation is 1477m, but I assume that most of that is on disk or
Why is the virtual so high? Am I missing something? Shouldn't the hard
limit be on the actual physical memory usage, not the virtual? Is vmem
the only predictable metric for the memory footprint of a job instance?
Is there a hard limit we can set on physical RAM, not virtual?
If we run five of these jobs, a node is full, but there is still free
cores and free physical ram. This is not optimal. How are others
reigning in their job memory usage effectively and still using the
system to the fullest?
Gavin W. Burris
Senior Systems Programmer
Information Security and Unix Systems
School of Arts and Sciences
University of Pennsylvania
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users