[GE users] SGE Job Memory Usage

mhanby mhanby at uab.edu
Wed Apr 14 17:44:50 BST 2010


Grid Engine 6.2u5
CentOS 5 x86_64 with kernel 2.6.18-128.7.1.el5

We are working on getting our users to consider how much memory their job tasks will use. Currently, we are requiring that they request mem_free, but ultimately I'd like to get them using h_vmem.

Before we can use h_vmem, the users have been asking the obvious question, how do I tell how much memory my job used, so that I know how much to request the next time I run it.

For a currently running job, if I look at:

$ qstat -j 230781 |grep ^usage
usage  1:  cpu=01:23:58, mem=18004.07080 GBs, io=0.00000, vmem=3.859G, maxvmem=4.077G

Is this telling me that the job currently has used no more than 4.077GB of RAM (virtual and physical)?

If I look at the process for this job on the compute node, ps reports it as using 48% of the systems RAM (on a 16GB system), which would be roughly 7.6GB.

$ ssh compute-1-5 ps auxf|grep jsmith
jsmith  28100  0.0  0.0  67992  1336 ?        Ss   10:04   0:00  |   \_ -bash /opt/gridengine/default/spool/compute-1-5/job_scripts/230781
jsmith  28206  0.0  0.0  65900  1208 ?        S    10:04   0:00  |       \_ sh /share/apps/R/R-2.9.0/gnu/lib/R/bin/Rcmd BATCH mainCode_Xthin_5.R run_thin_5.out
jsmith  28210 99.4 48.5 8106856 7972440 ?     R    10:04  90:11  |           \_ /share/apps/R/R-2.9.0/gnu/lib/R/bin/exec/R -f mainCode_Xthin_5.R --restore --save --no-readline

And free -m reports

$ ssh compute-1-5 free -m
             total       used       free     shared    buffers     cached
Mem:         16050      12063       3986          0          7       1258
-/+ buffers/cache:      10798       5251
Swap:          996          0        996

If I sum up the other processes % usage reported by the ps command, they add up to the 10GB usage reported by 'free' so it appears that this R process really is using.

So I guess my question is really, are the resource usage values reported by qstat and the job email really accurate or do I need to gather the metrics elsewhere?



Mike Hanby
mhanby at uab.edu
Information Systems Specialist II
IT HPCS / Research Computing


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list