[GE users] OpenMPI and memory limits

fabiomartinelli fabio.martinelli at esa.int
Thu Oct 28 14:11:03 BST 2010


my OpenMPI 1.4 + SGE 6.2u3 works at the basic level, I can submit and
retrieve results, fine.

now I'd like implement this policy, I have an MPI job requesting 16 cores and
16GB of RAM, I created a queue mpi.q.16 with just 1 core in 16 server, which
h_vmem limit should I enforce on the queue mpi.q.16 or should I do ?
I don't want that 1 piece of the MPI computing exploits more than 16GB of RAM
in a server.

also, during and at the end of the MPI computing, may I retrieve the memory
that was used server by server ?

this is the actual conf:

[root at scigrid ~]# qconf  -sq mpi.q.16
qname                 mpi.q.16
hostlist              @infiniband
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              19
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               mpi mpich2_mpd mpich2_smpd_rsh mvapich2 openmp openmpi
rerun                 FALSE
slots                 1
tmpdir                /tmp
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            mpi.q.16
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  1080:00:00
s_cpu                 INFINITY
h_cpu                 360:00:00
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                16G

thanks a lot


