Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (76 - 78 of 431)

Ticket Resolution Summary Owner Reporter
#1480 fixed Prevent root-owned files in execd active_job spool area markdixon
Description

The new cgroup/cpuset code uses a couple of routines for switching effective uid/gid which appear to be causing some problems.

Some of the side symptoms include the following files in the execd spool sometimes being owned by root:

active_jobs/<JID>.<TASK>/config active_jobs/<JID>.<TASK>/environment active_jobs/<JID>.<TASK>/pe_hostfile active_jobs/<JID>.<TASK>/<NUM>.<HOST>/

That last entry is a directory created for a SLAVE task. It being root-owned can cause jobs to fail with a "can't open pid file" error message.

The execd appears to have the correct euid/egid when entering the cgroup code, so I have removed the offending function calls. I don't know if there's a good reason for them that I've not noticed in limited testing.

Potential patch attached.

Cheers,

Mark --


Mark Dixon Email : m.c.dixon@… HPC/Grid Systems Support Tel (int): 35429 Information Systems Services Tel (ext): +44(0)113 343 5429 University of Leeds, LS2 9JT, UK


0001-Prevent-root-owned-files-in-execd-active_job-spool-a.patch

#1478 fixed Installation Bugs Dave Love <d.love@…> tobeychris@…
Description

Hello Everyone,

I would like to report the following two issues that I encountered when installing SoGE this week.

1) When using the GUI to install the qmaster, you cannot select 'classic' as the spooling method or the 'select hosts' page will be blank.

2) When installing dbwriter with a postgresql backend the startup script fails when running default/common/sgedbwriter.

For 1) I am not sure what is wrong, if you go back and select Berkeleydb and go forward, it's fine. The doesn't seem to be a way to get local spooling installed through the SoGE GUI (OGS's GUI works). I had to install the qmaster via inst_sge first, then I could install the execution hosts (with local spooling) through the GUI.

Here is the error that I get in the term that launches it:

Starting Installer ...

java.lang.NullPointerException?

at java.util.Hashtable.put(Hashtable.java:542)

at

java.util.Properties.setProperty(Properties.java:161)

at

com.izforge.izpack.installer.AutomatedInstallData?.setVariable(Unknown Source)

at

com.sun.grid.installer.gui.HostPanel?.panelActivate(HostPanel?.java:694)

at

com.izforge.izpack.installer.InstallerFrame?.switchPanel(Unknown Source)

at

com.izforge.izpack.installer.InstallerFrame?.navigateNext(Unknown Source)

at

com.izforge.izpack.installer.InstallerFrame?.navigateNext(Unknown Source)

at

com.izforge.izpack.installer.InstallerFrame?$NavigationHandler?.actionPerforme d(Unknown Source)

at

javax.swing.AbstractButton?.fireActionPerformed(AbstractButton?.java:2018)

at

javax.swing.AbstractButton?$Handler.actionPerformed(AbstractButton?.java:2341)

at

javax.swing.DefaultButtonModel?.fireActionPerformed(DefaultButtonModel?.java:4 02)

at

javax.swing.DefaultButtonModel?.setPressed(DefaultButtonModel?.java:259)

at

javax.swing.plaf.basic.BasicButtonListener?.mouseReleased(BasicButtonListener? .java:252)

at

java.awt.Component.processMouseEvent(Component.java:6505)

at

javax.swing.JComponent.processMouseEvent(JComponent.java:3312)

at

java.awt.Component.processEvent(Component.java:6270)

at

java.awt.Container.processEvent(Container.java:2229)

at

java.awt.Component.dispatchEventImpl(Component.java:4861)

at

java.awt.Container.dispatchEventImpl(Container.java:2287)

at

java.awt.Component.dispatchEvent(Component.java:4687)

at

java.awt.LightweightDispatcher?.retargetMouseEvent(Container.java:4832)

at

java.awt.LightweightDispatcher?.processMouseEvent(Container.java:4492)

at

java.awt.LightweightDispatcher?.dispatchEvent(Container.java:4422)

at

java.awt.Container.dispatchEventImpl(Container.java:2273)

at

java.awt.Window.dispatchEventImpl(Window.java:2719)

at

java.awt.Component.dispatchEvent(Component.java:4687)

at

java.awt.EventQueue?.dispatchEventImpl(EventQueue?.java:735)

at

java.awt.EventQueue?.access$200(EventQueue?.java:103)

at java.awt.EventQueue?$3.run(EventQueue?.java:694)

at java.awt.EventQueue?$3.run(EventQueue?.java:692)

at

java.security.AccessController?.doPrivileged(Native Method)

at

java.security.ProtectionDomain?$1.doIntersectionPrivilege(ProtectionDomain?.ja va:76)

at

java.security.ProtectionDomain?$1.doIntersectionPrivilege(ProtectionDomain?.ja va:87)

at java.awt.EventQueue?$4.run(EventQueue?.java:708)

at java.awt.EventQueue?$4.run(EventQueue?.java:706)

at

java.security.AccessController?.doPrivileged(Native Method)

at

java.security.ProtectionDomain?$1.doIntersectionPrivilege(ProtectionDomain?.ja va:76)

at

java.awt.EventQueue?.dispatchEvent(EventQueue?.java:705)

at

java.awt.EventDispatchThread?.pumpOneEventForFilters(EventDispatchThread?.java :242)

at

java.awt.EventDispatchThread?.pumpEventsForFilter(EventDispatchThread?.java:16 1)

at

java.awt.EventDispatchThread?.pumpEventsForHierarchy(EventDispatchThread?.java :150)

at

java.awt.EventDispatchThread?.pumpEvents(EventDispatchThread?.java:146)

at

java.awt.EventDispatchThread?.pumpEvents(EventDispatchThread?.java:138)

at

java.awt.EventDispatchThread?.run(EventDispatchThread?.java:91)

For 2) I believe the problem is the placement of two semicolons in the template file "util/sgedbwriter_template" as follows:

Line 554

  • [ -f /var/lock/subsys/sgedbwriter ] && rm -f

/var/lock/subsys/sgedbwriter;;

+ [ -f /var/lock/subsys/sgedbwriter ] && rm -f /var/lock/subsys/sgedbwriter

+ ;;

This seems to have fixed the issue for me and I was able to install the dbwriter.

Some information about my setup:

Systems:

5 x Haswell 4770k with 32GB RAM

CentOS 6.4 (64bit) with X11

Sons of Grid Engine 8.1.5 RPMs and tars taken directly from site.

SGE_ROOT is on an NFSv3 share on a separate server (hence the local spooling).

Thanks,

-Chris

#1477 fixed Prevent SGE_BINDING variable having a spurious space at start markdixon
Description

Hi Dave,

I've been looking at your cpuset support on Linux where USE_CGROUP=1 and in the process noticed that:

  • Where a job was not allocated cpu 0, the SGE_BINDING string always

starts with a space.

  • SGE_BINDING is subsequently used in shepherd.c's cpusetting() to generate the cpuset binding string. The spurious space is translated into a comma, causing the allocation of cpus to the set to fail.

The attached patch ought to fix the immediate issue, although I wonder if the shepherd should perhaps be interpreting the "binding" string in its configuration file instead.

Cheers,

Mark --


Mark Dixon Email : m.c.dixon@… HPC/Grid Systems Support Tel (int): 35429 Information Systems Services Tel (ext): +44(0)113 343 5429 University of Leeds, LS2 9JT, UK


0001-Prevent-SGE_BINDING-variable-having-a-spurious-space.patch

Note: See TracQuery for help on using queries.