[GE users] ulimits and large openmpi scaling - best way to alter limits for running under SGE?

craffi dag at sonsorol.org
Tue Aug 4 15:01:02 BST 2009


Thanks Dan -

The system is running 6.2u3 using the "builtin" rsh methods. Those are  
not bound by the usual sort of rsh limits right?

-Chris



On Aug 4, 2009, at 9:50 AM, templedf wrote:

> Are you using rsh as the qrsh transport?  If so, that's your problem.
> It's a port limit in rsh.  To avoid the issue, switch to either the
> built-in interactive job support that was introduced with 6.2, or  
> switch
> to ssh.  Note that there are some caveats with ssh.  Basically, you
> either have to use a customized ssh or live without slave accounting  
> and
> control.
>
> Daniel
>
> craffi wrote:
>> Hi folks,
>>
>> When trying to run an openmpi job above 950+ CPU cores I always seem
>> to hit this error:
>>
>>
>>> mca_oob_tcp_accept: accept() failed: Too many open files (24).
>>>
>>
>> ... which clearly seems to be a system limit/ulimit issue. We are
>> running RHEL5 on Intel Nehalem.
>>
>> Looking for the "most proper" way to deal with ulimit settings with
>> SGE - I seem to recall there is a proper way to do this and I can't
>> for the life of me remember. Do we put ulimit statements into
>> submission scripts? Personal shell startup files? Bake them into the
>> SGE daemon start scripts?
>>
>> Any tips on expanding/setting ulimits for running apps under SGE  
>> would
>> be appreciated, thanks!
>>
>> -Chris
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210892
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210893
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210898

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list