[GE users] Setting memlock limit with SGE 6.2; was: Re: [GE users] worth a wiki entry for SGE with OpenMPI and Infiniband

Andy Schwierskott andy.schwierskott at sun.com
Mon Jul 21 11:17:02 BST 2008

Hi Erik,

we don't have this like no means to set a default $PATH other than the
compiled in path. The umask is hardcoded to 022.


On Mon, 21 Jul 2008, Erik Soyez wrote:

> Thanks Andy, cool!  What about the umask - any similar hack available?
> :-)Erik Soyez.
> On Mon, 21 Jul 2008, Andy Schwierskott wrote:
>> Hi,
>> just on a side node regarding the 'memlock' resource limit issue which are
>> reported here sometimes: For SGE 6.2 (it's not part of the Beta and Beta
>> refresh however) we added as a last minute feature the ability to configure
>> the memlock limit and a few others on the execd level, i.e. via the
>> 'execd_params' cluster config setting.
>> Full background: in the SGE queue config you can configure most but not all
>> Unix resource limits like CPU time, max. virtual memory and so on. There 
>> are
>> a few others like the maximum file descriptor limit which exist on 
>> virtually
>> all OS'es and some which just exit on one or a few OS'es (like the 
>> "memlock"
>> limit). It was too late to extend the queue configuration and we found the
>> workaround to configure these limits indirectly by hacking the SGE execd
>> startup scripts (this is the chain how the job inherits such limits if they
>> are not set) to implicit and error prone, therefore we decided to enable an
>> admin to set such limits via the execd_params setting.
>> It's not a 100% perfect solution: it's a execd setting valid for all jobs
>> running in all queues on that host and it does not provide a solution to 
>> use
>> the configured system wide limits set e.g. in /etc/security/limits.conf on
>> Linux. Nevertheless it's much better than requiring to edit the job scripts
>> or the execd startup scripts which could get overwritten with an update and
>> would not work if for testing purposes the execd is started directly e.g. 
>> in
>> debug mode.
>> For the interested reader here's an excerpt from the SGE 6.2 sge_conf(5) 
>> man
>> page which describes the syntax and semantic of these settings:
>>          Specifies soft and hard resource limits as  implemented
>>          by  the  setrlimit(2) system call. See this manual page
>>          on your system for more information.  These  parameters
>>          complete  the list of limits set by the RESOURCE LIMITS
>>          parameter of the queue configuration  as  described  in
>>          queue_conf(5).  Unlike the resource limits in the queue
>>          configuration, these resource limits are set for  every
>>          job  on  this  execution host. If a value is not speci-
>>          fied, the resource limit is inherited from  the  execu-
>>          tion   daemon  process.  Because  this  would  lead  to
>>          unpredicted results, if only one limit of a resource is
>>          set  (soft  or  hard), the corresponding other limit is
>>          set to the same value.
>>          S_DESCRIPTORS and H_DESCRIPTORS  specify  a  value  one
>>          greater  than  the  maximum file descriptor number that
>>          can be opened by any process of a job.
>>          S_MAXPROC and H_MAXPROC specify the maximum  number  of
>>          processes  that  can be created by the job user on this
>>          execution host
>>          S_MEMORYLOCKED and H_MEMORYLOCKED specify  the  maximum
>>          number  of  bytes  of virtual memory that may be locked
>>          into RAM.
>>          S_LOCKS and H_LOCKS specify the maximum number of  file
>>          locks any process of a job may establish.
>>          All of these values can be specified using  the  multi-
>>          plier letters k, K, m, M, g and G, see sge_types(1) for
>>          details.
>> So you would simply set
>>  execd_params H_MEMORYLOCKED=unlimited
>> to set the soft and hard Linux "memlock" limit to unlimited.
>> On OS'es which do not support one of these limits the setting will be
>> silently ignored.
>> There's still a gotcha: If you would use the old interactive job support 
>> and
>> not the default builtin new one (qrsh without command which calls the 
>> system
>> rlogind), qlogin which uses the system telnetd and likley ssh(d)) the SGE
>> setting owuld get overridden since those daemons adhere to the
>> /etc/security/limits.conf on Linux. They are started after the shepherd 
>> sets
>> those limits.
>> For SGE 6.1 and earlier the best workaround in my opinion is to set those
>> limits in the execd startup script. At least this eliminates a different
>> behavior if the execd is started at system boot time or later by an
>> interactively logged in root user. As stated above care has to be taken 
>> when
>> the execd startup script is changed, a new execd is installed or the execd
>> is started directly without using the startup script.
>> Andy
>> On Sun, 20 Jul 2008, John Leidel wrote:
>>> I second Joe's motion.  I've done this for quite some time manually by
>>> creating a set of startup/pre/post wrapper scripts such that...
>>> for a in `ls $SGE_ROOT/scripts/pre/`; do
>>>    exec $a
>>> done;
>>> ....blah blah blah
>>> cheers
>>> john
>>> On Sun, Jul 20, 2008 at 9:43 AM, Joe Landman
>>> <landman at scalableinformatics.com> wrote:
>>>> Hi folks
>>>>  On a related note, for this same cluster, we were using infiniband. One 
>>>> of
>>>> the issues with OpenMPI and SGE is that the maximum locked memory (on 
>>>> linux)
>>>> is set way too low for Infiniband, and it can't lock enough memory.  You 
>>>> can
>>>> "fix" this with settings in /etc/security/limits.conf, simply add these 
>>>> two
>>>> lines to the file
>>>>        *               soft    memlock unlimited
>>>>        *               hard    memlock unlimited
>>>> However, it appears that this works for running OpenMPI over Infiniband 
>>>> apps
>>>> by hand, but not through SGE.  I found that I needed to insert an
>>>>        ulimit -l unlimited
>>>> in the SGE execd run script, right near the top, or
>>>>        qrsh ulimit -l
>>>> would always return 32 (kilobytes), and the Infiniband based job wouldn't
>>>> run.
>>>> I would like to suggest including a line like this in your execd startup
>>>> script.
>>>> For the SGE developers, if you could include an environment
>>>> startup/scripting/tweaking section right before you fire off the main
>>>> sgeexecd process, this could help with other (future) issues like this.
>>>>  Might be worth creating an $SGE/execd_environment directory to contain 
>>>> the
>>>> scripts/settings we need.
> --

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list