[GE users] ulimits and large openmpi scaling - best way to alter limits for running under SGE?

templedf dan.templeton at sun.com
Tue Aug 4 14:50:53 BST 2009


Are you using rsh as the qrsh transport?  If so, that's your problem.  
It's a port limit in rsh.  To avoid the issue, switch to either the 
built-in interactive job support that was introduced with 6.2, or switch 
to ssh.  Note that there are some caveats with ssh.  Basically, you 
either have to use a customized ssh or live without slave accounting and 
control.

Daniel

craffi wrote:
> Hi folks,
>
> When trying to run an openmpi job above 950+ CPU cores I always seem  
> to hit this error:
>
>   
>> mca_oob_tcp_accept: accept() failed: Too many open files (24).
>>     
>
> ... which clearly seems to be a system limit/ulimit issue. We are  
> running RHEL5 on Intel Nehalem.
>
> Looking for the "most proper" way to deal with ulimit settings with  
> SGE - I seem to recall there is a proper way to do this and I can't  
> for the life of me remember. Do we put ulimit statements into  
> submission scripts? Personal shell startup files? Bake them into the  
> SGE daemon start scripts?
>
> Any tips on expanding/setting ulimits for running apps under SGE would  
> be appreciated, thanks!
>
> -Chris
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210892
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210893

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list