[GE users] h_vmem, virtual_free
Heywood, Todd
heywood at cshl.edu
Fri Feb 23 16:15:41 GMT 2007
Hi,
I have a puzzle, where a certain program (vmatch) runs fine outside of
SGE on a 2GB node. However, it does not run when submitted as a job to
SGE, and using strace shows it runs into memory allocation errors. I
have resource virtual_free defined as requestable and consumable, and
h_vmem is requestable but NOT consumable. Using "-l virtual_free=1.9G"
on a 2GB node, or "-l virtual_free=3.8G" for a 4GB node works as
expected in that the job runs only when there is enough memory available
and SGE subtracts the requested amount from virtual_free when the job
starts running. However the job still gets the memory allocation error.
However, if I use "-l h_vmem=4G" and submit the job to either a 4GB or
2GB (!) node, the job runs fine with no errors.
This makes no sense to me, especially when the job runs on a 2GB node
with h_vmem=4G specified. Can anyone explain?
Here's the qhost output for a 4GB node. I'm not sure why h_vmem isn't
reported (my global execution host reporting variables are defined to
be: cpu, h_vmem, mem_free, np_load_avg, s_vmem, virtual_free).
[root at bhmnode2 tmp]# qhost -F -h blade1
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
------------------------------------------------------------------------
-------
global - - - - - -
-
blade1 lx24-amd64 4 0.17 3.9G 242.6M 1.0G
21.2M
hl:arch=lx24-amd64
hl:num_proc=4.000000
hl:mem_total=3.861G
hl:swap_total=1.004G
hl:virtual_total=4.865G
hl:load_avg=0.170000
hl:load_short=0.000000
hl:load_medium=0.170000
hl:load_long=0.230000
hl:mem_free=3.624G
hl:swap_free=1006.340M
hc:virtual_free=3.800G
hl:mem_used=242.598M
hl:swap_used=21.246M
hl:virtual_used=263.844M
hl:cpu=0.000000
hl:tmpfree=59.128G
hl:tmptot=64.702G
hl:tmpused=2.287G
hl:np_load_avg=0.042500
hl:np_load_short=0.000000
hl:np_load_medium=0.042500
hl:np_load_long=0.057500
Thanks for any ideas!
Todd Heywood
More information about the gridengine-users
mailing list