[GE users] h_vmem, virtual_free

Heywood, Todd heywood at cshl.edu
Fri Feb 23 16:15:41 GMT 2007


Hi,

 

I have a puzzle, where a certain program (vmatch) runs fine outside of
SGE on a 2GB node. However, it does not run when submitted as a job to
SGE, and using strace shows it runs into memory allocation errors. I
have resource virtual_free defined as requestable and consumable, and
h_vmem is requestable but NOT consumable. Using "-l virtual_free=1.9G"
on a 2GB node, or "-l virtual_free=3.8G" for a 4GB node works as
expected in that the job runs only when there is enough memory available
and SGE subtracts the requested amount from virtual_free when the job
starts running. However the job still gets the memory allocation error.

 

However, if I use "-l h_vmem=4G" and submit the job to either a 4GB or
2GB (!) node, the job runs fine with no errors. 

 

This makes no sense to me, especially when the job runs on a 2GB node
with h_vmem=4G specified. Can anyone explain?

 

Here's the qhost output for a 4GB node. I'm not sure why h_vmem isn't
reported (my global execution host reporting variables are defined to
be: cpu, h_vmem, mem_free, np_load_avg, s_vmem, virtual_free).

 

 

[root at bhmnode2 tmp]# qhost -F -h blade1

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
SWAPUS

------------------------------------------------------------------------
-------

global                  -               -     -       -       -       -
-

blade1                  lx24-amd64      4  0.17    3.9G  242.6M    1.0G
21.2M

   hl:arch=lx24-amd64

   hl:num_proc=4.000000

   hl:mem_total=3.861G

   hl:swap_total=1.004G

   hl:virtual_total=4.865G

   hl:load_avg=0.170000

   hl:load_short=0.000000

   hl:load_medium=0.170000

   hl:load_long=0.230000

   hl:mem_free=3.624G

   hl:swap_free=1006.340M

   hc:virtual_free=3.800G

   hl:mem_used=242.598M

   hl:swap_used=21.246M

   hl:virtual_used=263.844M

   hl:cpu=0.000000

   hl:tmpfree=59.128G

   hl:tmptot=64.702G

   hl:tmpused=2.287G

   hl:np_load_avg=0.042500

   hl:np_load_short=0.000000

   hl:np_load_medium=0.042500

   hl:np_load_long=0.057500

 

 

Thanks for any ideas!

 

Todd Heywood

 




More information about the gridengine-users mailing list