[GE users] Setting memlock limit with SGE 6.2; was: Re: [GE users] worth a wiki entry for SGE with OpenMPI and Infiniband

Erik Soyez E.Soyez at science-computing.de
Mon Jul 21 09:37:19 BST 2008

Thanks Andy, cool!  What about the umask - any similar hack available?

:-)Erik Soyez.

On Mon, 21 Jul 2008, Andy Schwierskott wrote:

> Hi,
> just on a side node regarding the 'memlock' resource limit issue which are
> reported here sometimes: For SGE 6.2 (it's not part of the Beta and Beta
> refresh however) we added as a last minute feature the ability to configure
> the memlock limit and a few others on the execd level, i.e. via the
> 'execd_params' cluster config setting.
> Full background: in the SGE queue config you can configure most but not all
> Unix resource limits like CPU time, max. virtual memory and so on. There are
> a few others like the maximum file descriptor limit which exist on virtually
> all OS'es and some which just exit on one or a few OS'es (like the "memlock"
> limit). It was too late to extend the queue configuration and we found the
> workaround to configure these limits indirectly by hacking the SGE execd
> startup scripts (this is the chain how the job inherits such limits if they
> are not set) to implicit and error prone, therefore we decided to enable an
> admin to set such limits via the execd_params setting.
> It's not a 100% perfect solution: it's a execd setting valid for all jobs
> running in all queues on that host and it does not provide a solution to use
> the configured system wide limits set e.g. in /etc/security/limits.conf on
> Linux. Nevertheless it's much better than requiring to edit the job scripts
> or the execd startup scripts which could get overwritten with an update and
> would not work if for testing purposes the execd is started directly e.g. in
> debug mode.
> For the interested reader here's an excerpt from the SGE 6.2 sge_conf(5) man
> page which describes the syntax and semantic of these settings:
>          Specifies soft and hard resource limits as  implemented
>          by  the  setrlimit(2) system call. See this manual page
>          on your system for more information.  These  parameters
>          complete  the list of limits set by the RESOURCE LIMITS
>          parameter of the queue configuration  as  described  in
>          queue_conf(5).  Unlike the resource limits in the queue
>          configuration, these resource limits are set for  every
>          job  on  this  execution host. If a value is not speci-
>          fied, the resource limit is inherited from  the  execu-
>          tion   daemon  process.  Because  this  would  lead  to
>          unpredicted results, if only one limit of a resource is
>          set  (soft  or  hard), the corresponding other limit is
>          set to the same value.
>          S_DESCRIPTORS and H_DESCRIPTORS  specify  a  value  one
>          greater  than  the  maximum file descriptor number that
>          can be opened by any process of a job.
>          S_MAXPROC and H_MAXPROC specify the maximum  number  of
>          processes  that  can be created by the job user on this
>          execution host
>          S_MEMORYLOCKED and H_MEMORYLOCKED specify  the  maximum
>          number  of  bytes  of virtual memory that may be locked
>          into RAM.
>          S_LOCKS and H_LOCKS specify the maximum number of  file
>          locks any process of a job may establish.
>          All of these values can be specified using  the  multi-
>          plier letters k, K, m, M, g and G, see sge_types(1) for
>          details.
> So you would simply set
>  execd_params H_MEMORYLOCKED=unlimited
> to set the soft and hard Linux "memlock" limit to unlimited.
> On OS'es which do not support one of these limits the setting will be
> silently ignored.
> There's still a gotcha: If you would use the old interactive job support and
> not the default builtin new one (qrsh without command which calls the system
> rlogind), qlogin which uses the system telnetd and likley ssh(d)) the SGE
> setting owuld get overridden since those daemons adhere to the
> /etc/security/limits.conf on Linux. They are started after the shepherd sets
> those limits.
> For SGE 6.1 and earlier the best workaround in my opinion is to set those
> limits in the execd startup script. At least this eliminates a different
> behavior if the execd is started at system boot time or later by an
> interactively logged in root user. As stated above care has to be taken when
> the execd startup script is changed, a new execd is installed or the execd
> is started directly without using the startup script.
> Andy
> On Sun, 20 Jul 2008, John Leidel wrote:
>> I second Joe's motion.  I've done this for quite some time manually by
>> creating a set of startup/pre/post wrapper scripts such that...
>> for a in `ls $SGE_ROOT/scripts/pre/`; do
>>    exec $a
>> done;
>> ....blah blah blah
>> cheers
>> john
>> On Sun, Jul 20, 2008 at 9:43 AM, Joe Landman
>> <landman at scalableinformatics.com> wrote:
>>> Hi folks
>>>  On a related note, for this same cluster, we were using infiniband. One 
>>> of
>>> the issues with OpenMPI and SGE is that the maximum locked memory (on 
>>> linux)
>>> is set way too low for Infiniband, and it can't lock enough memory.  You 
>>> can
>>> "fix" this with settings in /etc/security/limits.conf, simply add these 
>>> two
>>> lines to the file
>>>        *               soft    memlock unlimited
>>>        *               hard    memlock unlimited
>>> However, it appears that this works for running OpenMPI over Infiniband 
>>> apps
>>> by hand, but not through SGE.  I found that I needed to insert an
>>>        ulimit -l unlimited
>>> in the SGE execd run script, right near the top, or
>>>        qrsh ulimit -l
>>> would always return 32 (kilobytes), and the Infiniband based job wouldn't
>>> run.
>>> I would like to suggest including a line like this in your execd startup
>>> script.
>>> For the SGE developers, if you could include an environment
>>> startup/scripting/tweaking section right before you fire off the main
>>> sgeexecd process, this could help with other (future) issues like this.
>>>  Might be worth creating an $SGE/execd_environment directory to contain 
>>> the
>>> scripts/settings we need.


Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196 

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list