[SGE-discuss] Cgroups in SoGE 8.1.7 (SL65)

Arnau Bria listsarnau at gmail.com
Fri Jun 20 12:08:03 BST 2014


Hi all,

I need some help for configuring cgroups in SoGE 8.1.7 on Scientific
Linux 6.5 (2.6.32-431.11.2.el6.x86_64).

I've (more or less) followed instructions found
in  ./source/libs/uti2/sge_cgroup.c and sge_conf man. 

server:

execd_params                 ENABLE_BINDING=true,S_DESCRIPTORS=10000, \
                             H_DESCRIPTORS=10000, USE_SMAPS, USE_CGROUPS
											^^^^^^^^^

On execution node, I've added in /etc/init.d/sgeexecd.crg :

[...]
  $bin_dir/sge_execd
  /opt/sge/util/resources/scripts/setup-cgroups-etc start
[....]


I start the daemon and the cpuset is created:
# ls -lsa /dev/cpuset/sge/
total 0
0 drwxr-xr-x 2 sgeadmin root 0 jun 20 12:24 .
0 drwxr-xr-x 3 root     root 0 jun 20 12:24 ..
0 --w--w--w- 1 sgeadmin root 0 jun 20 12:24 cgroup.event_control
0 -rw-r--r-- 1 sgeadmin root 0 jun 20 12:24 cgroup.procs
0 -rw-r--r-- 1 sgeadmin root 0 jun 20 12:26 cpuset.cpu_exclusive
0 -rw-r--r-- 1 sgeadmin root 0 jun 20 12:26 cpuset.cpus
[...]


I submit a simple job like:

$ qsub -b y -q test  -binding linear:1 sleep 1000

But jobs fail with the error:

06/19/2014 16:36:07|  main|vmtest1|E|shepherd of job 15.1 died through signal = 6
06/19/2014 16:36:07|  main|vmtest1|E|abnormal termination of shepherd for job 15.1: no "exit_status" file
06/19/2014 16:36:07|  main|vmtest1|E|can't open file active_jobs/15.1/error: No such file or directory
06/19/2014 16:36:07|  main|vmtest1|E|can't open pid file "active_jobs/15.1/pid" for job 15.1

Seems that it's not able to create the new cpuset . (directory belongs
to sgeadmin:
# adminuser=$(awk '/^admin_user/ {print $2}' $SGE_ROOT/$SGE_CELL/common/bootstrap)
# echo $adminuser
sgeadmin


Also, every time I restart sgeexecd I need to restart  cgconfig in
order to remove /cgroup/cpuset/sge ... (is this expected?)

I've been reading some old discussions about this and I thought that I
needed Mark Dixon's pathces, but the 3 he provided last October
(http://arc.liv.ac.uk/pipermail/sge-bugs/2013-October/subject.html#630)
are present in latest SoGE version...


What is wrong? is there some missing configuration?
Could someone help me to configure SoGE to use cpusets?

TIA,
Arnau


More information about the SGE-discuss mailing list