[GE users] OMP_NUM_THREADS in openmpi jobs

buudo uschmidt at mpi-halle.de
Fri Mar 19 10:08:18 GMT 2010

Hi everybody,
i've got trouble with an openmpi code which is using MKL_NUM_THREADS (similar to OMP_NUM_THREADS) to mix openmp and openmpi to a prallel job. We have 12-core machines (12 slots).  I want to use e.g. 48 slots in all, the 12 slots of a machine for the openmp threads on 4 machines. The pe environment has allocation rule 12 as I read in http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=225043
here is the script file:
#$ -S /bin/bash
#$ -N Cu
#$ -cwd
#$ -pe openmp_12 48
#$ -l vf=.3G
#$ -l h_rt=80:30:00

# Initialise environment module
. /usr/local/scripts/profile_motd.no
. /etc/profile.d/modules.sh
module purge
module load ompi-ifort-11.1
#$ -m bea
#$ -cwd
#$ -o /t2work/username/sge_log/$JOB_ID.out
#$ -e /t2work/username/sge_log/$JOB_ID.err
#$ -R y
#$ -m n
mpirun --mca btl openib,self -np $NSLOTS /t2work/username/bin/skkr.run >./out
Unfortunately the job starts with 48 slots on 4 nodes as expected, but overloads the nodes with 12x12 threads. Tis is seen in a output log file (12x48=576 threads) an in the qstat by enormous load, after a while the job crashes. If I use adifferent allocation rule e.g.2 and OMP_NUM_THREADS=2, we can survive, but the problem still exists.
Does anyone can give me an advice ?


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list