[GE users] ulimits and large openmpi scaling - best way to alter limits for running under SGE?

kjpursley kevin.pursley at bp.com
Tue Aug 4 14:59:51 BST 2009


If I understand the question, this is how we do it in the sgeexecd
script. 
Below is a snipit of the start of the script.
Don't know how proper it is but it works.



SGE_ROOT=/hpc/SGE; export SGE_ROOT
SGE_CELL=default; export SGE_CELL

unset CODINE_ROOT GRD_ROOT COD_CELL GRD_CELL

count=0
echo `date` " sge log start " >/tmp/log.sge while [ ! -d
"$SGE_ROOT/util" -a $count -le 15 ]; do
   count=`expr $count + 1`
   echo " Waiting on $SGE_ROOT/util to become available"
   echo `date` "waiting on SGE mount " >>/tmp/log.sge
   df  >>/tmp/log.sge
   cat /proc/mounts  >>/tmp/log.sge
   echo "........." >>/tmp/log.sge
   echo " ........."
   sleep 3
done
# ulimit changes
nofile=16384
echo "settings number of open files to $nofile " >>/tmp/log.sge 

-----Original Message-----
From: craffi [mailto:dag at sonsorol.org] 
Sent: Tuesday, August 04, 2009 8:41 AM
To: users at gridengine.sunsource.net
Subject: [GE users] ulimits and large openmpi scaling - best way to
alter limits for running under SGE?

Hi folks,

When trying to run an openmpi job above 950+ CPU cores I always seem to
hit this error:

> mca_oob_tcp_accept: accept() failed: Too many open files (24).

... which clearly seems to be a system limit/ulimit issue. We are
running RHEL5 on Intel Nehalem.

Looking for the "most proper" way to deal with ulimit settings with SGE
- I seem to recall there is a proper way to do this and I can't for the
life of me remember. Do we put ulimit statements into submission
scripts? Personal shell startup files? Bake them into the SGE daemon
start scripts?

Any tips on expanding/setting ulimits for running apps under SGE would
be appreciated, thanks!

-Chris

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=210892

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=210897

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list