[GE users] ulimits and large openmpi scaling - best way to alter limits for running under SGE?

templedf dan.templeton at sun.com
Tue Aug 4 15:09:59 BST 2009


Yep.  Sorry, missed the username.  Preaching to the choir... :)

Daniel

craffi wrote:
> Thanks Dan -
>
> The system is running 6.2u3 using the "builtin" rsh methods. Those are  
> not bound by the usual sort of rsh limits right?
>
> -Chris
>
>
>
> On Aug 4, 2009, at 9:50 AM, templedf wrote:
>
>   
>> Are you using rsh as the qrsh transport?  If so, that's your problem.
>> It's a port limit in rsh.  To avoid the issue, switch to either the
>> built-in interactive job support that was introduced with 6.2, or  
>> switch
>> to ssh.  Note that there are some caveats with ssh.  Basically, you
>> either have to use a customized ssh or live without slave accounting  
>> and
>> control.
>>
>> Daniel
>>
>> craffi wrote:
>>     
>>> Hi folks,
>>>
>>> When trying to run an openmpi job above 950+ CPU cores I always seem
>>> to hit this error:
>>>
>>>
>>>       
>>>> mca_oob_tcp_accept: accept() failed: Too many open files (24).
>>>>
>>>>         
>>> ... which clearly seems to be a system limit/ulimit issue. We are
>>> running RHEL5 on Intel Nehalem.
>>>
>>> Looking for the "most proper" way to deal with ulimit settings with
>>> SGE - I seem to recall there is a proper way to do this and I can't
>>> for the life of me remember. Do we put ulimit statements into
>>> submission scripts? Personal shell startup files? Bake them into the
>>> SGE daemon start scripts?
>>>
>>> Any tips on expanding/setting ulimits for running apps under SGE  
>>> would
>>> be appreciated, thanks!
>>>
>>> -Chris
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210892
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>>> ].
>>>
>>>       
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210893
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210898
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210900

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list