[GE users] jobs killed - job ... exceeds job hard limit "h_vmem" of queue ...

vinc17 vincent-sge at vinc17.org
Wed Aug 12 01:56:11 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Jobs are often killed with an error like:

  job 15476 exceeds job hard limit "h_vmem" of queue
  "cas at volla.lip.ens-lyon.fr" (540127232.00000 > limit:536870912.00000)

However at this time, such a job takes very little memory.

For instance, here's an excerpt of the log of one of the jobs
(I've included ps output in the logs, for all the processes I
own):

------------------------------------------------------------------------
  RSS    SZ    SZ    VSZ CMD
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15478 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15505 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15517 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
  496   552  1335   5340 ./tst-volla-1249985804-17760
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15518 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
  500   552  1335   5340 ./tst-volla-1249996409-19447
  500   552  1335   5340 ./tst-volla-1249935899-13082
 5312  3628 27449 109796 perl ./test32f -b6790 -l4 -c -m3 -dmaple -n32 def/def-p1961 111010100001100000000000000000000000000000000000000000 0 1099511627776 tmp/tst-volla-1249998089-19796 300
  848   604 22574  90296 ps -o rss,size,sz,vsz,cmd

[2009-08-11 13:47:37] Maple: startmaple (@maple = maple -q -s)
[2009-08-11 13:47:37] Maple: Maple started, pid = 19948
  RSS    SZ    SZ    VSZ CMD
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15478 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15505 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15517 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
  496   552  1335   5340 ./tst-volla-1249985804-17760
 6848  4464 33165 132660 perl -S /var/spool/sge/milip/volla/job_scripts/15518 -s=tomate --log --logtkill -r=1 -c=3 -l=4 -C --pgen=8 -o=-dmaple
  500   552  1335   5340 ./tst-volla-1249996409-19447
  500   552  1335   5340 ./tst-volla-1249935899-13082
 5488  3760 27482 109928 perl ./test32f -b6790 -l4 -c -m3 -dmaple -n32 def/def-p1961 111010100001100000000000000000000000000000000000000000 0 1099511627776 tmp/tst-volla-1249998089-19796 300
 1332   544  2221   8884 /opt/maple/bin.X86_64_LINUX/cmaple -I /opt/maple/lib/include -q -s
 2748  1728 25374 101496 /opt/maple/bin.X86_64_LINUX/mserver -kpipe 4 -I /opt/maple/lib/include -q -s
 1176   492   659   2636 /opt/maple/bin.X86_64_LINUX/mfsd 7 10
  852   604 22574  90296 ps -o rss,size,sz,vsz,cmd

Maple 10.06, X86 64 LINUX, Oct 2 2006 Build ID 255401
------------------------------------------------------------------------

But each time a job is killed (by SIGUSR2), the log stops at the line
"Maple: Maple started, pid = ...", which there seems to be enough memory.
Anyone knows what's wrong?

-- 
Vincent Lef?vre <vincent at vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=211922

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list