[GE users] ulimits and large openmpi scaling - best way to alter limits for running under SGE?

bomb20 Harvey.Richardson at zeenty.com
Tue Aug 4 14:58:30 BST 2009

craffi wrote:
> Hi folks,
> When trying to run an openmpi job above 950+ CPU cores I always seem  
> to hit this error:
>> mca_oob_tcp_accept: accept() failed: Too many open files (24).
> ... which clearly seems to be a system limit/ulimit issue. We are  
> running RHEL5 on Intel Nehalem.
> Looking for the "most proper" way to deal with ulimit settings with  
> SGE - I seem to recall there is a proper way to do this and I can't  
> for the life of me remember. Do we put ulimit statements into  
> submission scripts? Personal shell startup files? Bake them into the  
> SGE daemon start scripts?
> Any tips on expanding/setting ulimits for running apps under SGE would  
> be appreciated, thanks!

The only limit I'm explicitly setting in the SGE config is for maximum
locked memory...

  g4008:harvey% qconf -sconf | grep H_
  execd_params                 PDC_INTERVAL=60 H_MEMORYLOCKED=2G

I think you have to set the descriptor limit for all users in your OS
(/etc/security/limits.conf for SLES for example) and users may need ulimit
commands in scripts/shell startup.

And with rsh you will hit limits earlier.



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list