[GE users] h_vmem, virtual_free

Reuti reuti at staff.uni-marburg.de
Fri Feb 23 17:22:55 GMT 2007


    [ The following text is in the "WINDOWS-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 23.02.2007 um 17:15 schrieb Heywood, Todd:

> Hi,
>
>
>
> I have a puzzle, where a certain program (vmatch) runs fine outside  
> of SGE on a 2GB node. However, it does not run when submitted as a  
> job to SGE, and using strace shows it runs into memory allocation  
> errors. I have resource virtual_free defined as requestable and  
> consumable, and h_vmem is requestable but NOT consumable. Using ?-l  
> virtual_free=1.9G? on a 2GB node, or ?-l virtual_free=3.8G? for a  
> 4GB node works as expected in that the job runs only when there is  
> enough memory available and SGE subtracts the requested amount from  
> virtual_free when the job starts running. However the job still  
> gets the memory allocation error.
>
>
>
> However, if I use ?-l h_vmem=4G? and submit the job to either a 4GB  
> or 2GB (!) node, the job runs fine with no errors.
>
>
>
> This makes no sense to me, especially when the job runs on a 2GB  
> node with h_vmem=4G specified. Can anyone explain?
>
>
>
> Here?s the qhost output for a 4GB node. I?m not sure why h_vmem  
> isn?t reported (my global execution host reporting variables are  
> defined to be: cpu, h_vmem, mem_free, np_load_avg, s_vmem,  
> virtual_free).
h_vmem is a queue attribute, so qstat -F should show it.

a) is there any h_vmem defined in the queues, which will be taken if  
the user doesn't request it?

b) some programs need to limit the h_stack to an even lower value, if  
and only if h_vmem is other than unlimited. Just to note, that SGE  
will also set h_data and h_stack to the same value as h_vmem, unless  
they are defined with a lower value than h_vmem.

c) what "ulimit -Ha" and "ulimit -Hs" showing on the node?

d) you could also use c) in a jobscript to check the defined limit  
for this job.

-- Reuti

>
>
>
>
> [root at bhmnode2 tmp]# qhost -F -h blade1
>
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
> SWAPTO  SWAPUS
>
> ---------------------------------------------------------------------- 
> ---------
>
> global                  -               -     -       -        
> -       -       -
>
> blade1                  lx24-amd64      4  0.17    3.9G  242.6M     
> 1.0G   21.2M
>
>    hl:arch=lx24-amd64
>
>    hl:num_proc=4.000000
>
>    hl:mem_total=3.861G
>
>    hl:swap_total=1.004G
>
>    hl:virtual_total=4.865G
>
>    hl:load_avg=0.170000
>
>    hl:load_short=0.000000
>
>    hl:load_medium=0.170000
>
>    hl:load_long=0.230000
>
>    hl:mem_free=3.624G
>
>    hl:swap_free=1006.340M
>
>    hc:virtual_free=3.800G
>
>    hl:mem_used=242.598M
>
>    hl:swap_used=21.246M
>
>    hl:virtual_used=263.844M
>
>    hl:cpu=0.000000
>
>    hl:tmpfree=59.128G
>
>    hl:tmptot=64.702G
>
>    hl:tmpused=2.287G
>
>    hl:np_load_avg=0.042500
>
>    hl:np_load_short=0.000000
>
>    hl:np_load_medium=0.042500
>
>    hl:np_load_long=0.057500
>
>
>
>
>
> Thanks for any ideas!
>
>
>
> Todd Heywood
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list