[GE users] /etc/security/limits.conf

Andy Schwierskott andy.schwierskott at sun.com
Mon Jun 25 13:34:38 BST 2007


Hi,

running jobs are not affected when the execd is shutwond and restarted.
The proper suggest way would be:

    - disable the queue instances an the affected host
    - shutdown execd('s)
    - make the change to /etc/init.d/sgeexecd but for safety reasons to
      <sge-root>/<cell>/common/sgeexecd as well
    - restart execd('s)
    - enable affected queue instances.

You also might think about how to ensure that installation of new execd
hosts or re-installations will automatically get the fixed startup script -
T achieve this you'd changed the file

    <sge_root>/util/rctemplates/sgeexecd_template

and find a means to ensure that with future patch updates your local changes
do not get overridden.

The syntax "ulimit -a unlimited" does not seem to work (I tested it under
bash on Linux and under sh on Solaris). See the respective man pages. It
would be:

    ulimit -l unlimited
    (and all other limits you want to set)

and the really safe way would be to first set the hard, then the soft limit:

    ulimit -H -l unlimited
    ulimit -l unlimited

since a soft limit can't go beyond the hard limit.

In case you are going to change sgeexecd_template and
<sge-root>/<cell>/common/sgeexecd you need to be aware that the "-l" option
is Linux specific. So you'd need to properly if/else protect any operating
system specific settings if you run a heterogenous SGE environment.

Neverthess I'm wondering this really the correct way? Doesn't Linux have a
system wide login-defaults file where all these settings should be specified
to ensure that programs started at boot time have a well defined limit
setting?

Andy

> Hi Hristo,
>
> thanks for your reply...
>
>>> I had the same problem with OpenMPI and InifiniBand here and after
>>> some research I had to modify /etc/init.d/sgeexecd on each node. Just
>>> find
>>> the line:
>>>   ...
>>>   $bin_dir/sge_execd
>>>   ...
>>> (that's SN1GE 6.1 startup script but I think it should be pretty same
>>> for other versions)
>>> and put an unlimit command on the line before so it becomes:
>>>   ...
>>>   ulimit -a unlimited
>>>   $bid_dir/sge_execd
>>>   ...
>>
>> Sorry, not 'ulimit -a unlimited' but 'ulimit -l unlimited'.
>
> Can I try this solution while there are running jobs in the queue? I
> *assume* I should change  /etc/init.d/sgeexecd at proper line, and then
> do a "/etc/init.d/sgeexecd softstop", right? Since there are jobs from
> april in queue, I cannot afford to loose them.
>
> thanks for your help.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list