[GE users] ulimits and large openmpi scaling - best way to alter limits for running under SGE?

templedf dan.templeton at sun.com
Tue Aug 4 14:50:53 BST 2009

Are you using rsh as the qrsh transport?  If so, that's your problem.  
It's a port limit in rsh.  To avoid the issue, switch to either the 
built-in interactive job support that was introduced with 6.2, or switch 
to ssh.  Note that there are some caveats with ssh.  Basically, you 
either have to use a customized ssh or live without slave accounting and 


craffi wrote:
> Hi folks,
> When trying to run an openmpi job above 950+ CPU cores I always seem  
> to hit this error:
>> mca_oob_tcp_accept: accept() failed: Too many open files (24).
> ... which clearly seems to be a system limit/ulimit issue. We are  
> running RHEL5 on Intel Nehalem.
> Looking for the "most proper" way to deal with ulimit settings with  
> SGE - I seem to recall there is a proper way to do this and I can't  
> for the life of me remember. Do we put ulimit statements into  
> submission scripts? Personal shell startup files? Bake them into the  
> SGE daemon start scripts?
> Any tips on expanding/setting ulimits for running apps under SGE would  
> be appreciated, thanks!
> -Chris
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210892
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list