[GE users] Infiniband does not work with SGE+MPI on Linux

reuti reuti at staff.uni-marburg.de
Sat Dec 20 12:53:05 GMT 2008


Hi,

Am 18.12.2008 um 13:38 schrieb vaclam1 at fel.cvut.cz:

> Hi,
>
> I have a problem with running parallel programs using MPI+Infiniband.
> When I run a parallel program using SGE over Infiniband then the
> program never ends or runs very very long !
>
> For example:
> -----------------------------------------------------------------
> frontend$> cat parallel_job.sh
> #!/bin/sh
> # -S /bin/sh
> # -cwd
> # -e .
> # -o .
> # -V

1) you will need:

#$ -V

and alike. Otherwise it's just a comment.

> INFINIBAND="true"
>
> MY_PARALLEL_PROGRAM="./example1"
>
> if [[ ${INFINIBAND} = "true" ]]
> then
>    # InfiniBand
>    # It is same:
>    # mpirun --mca btl openib,self -np $NSLOTS ${MY_PARALLEL_PROGRAM}
>    mpirun -np $NSLOTS ${MY_PARALLEL_PROGRAM}
> else
>    # Ethernet
>    mpirun --mca btl tcp,self -np $NSLOTS ${MY_PARALLEL_PROGRAM}
> fi
>
>
> frontend$> qsub -pe ompi N -q vip.q parallel_job.sh
> -----------------------------------------------------------------
>
>
> If N >= X then parallel program never ends.  Some are processes
> waiting at the barrier on messages. The messages are send to the
> waiting processes but messages are not delivered (the sending process
> really sends the messages).
>
> If N <  X then parallel program runs very very long.
>
> Value of N (X) depend on specific parallel program. The problem
> occures only when the messages are sent using the Infiniband.
>
>
>
> When I run parallel programs using SGE over the Ethernet then programs
> runs good.  When I run parallel programs (over Infiniband) directly
> from the command line using openMPI then the programs runs good and
> quick.
>
> For example:
> -----------------------------------------------------------------
> frontend$> mpirun --hostfile hostfile -np 12 ./example1
>
> frontend$> cat hostfile
> node-003 slots=4 max-slots=4
> node-005 slots=4 max-slots=4
> node-008 slots=4 max-slots=4
> node-010 slots=4 max-slots=4
> node-012 slots=4 max-slots=4
> node-014 slots=4 max-slots=4
>
> -----------------------------------------------------------------
>
> we have:
>
> SGE 6.2
>
> 6 nodes (6*4 cpu = 6 * Dual-Core AMD Opteron(tm) Processor 2218)
> InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor
> compatibility mode) (rev a0)
> SuSE SLES 10
>
> openMPI 1.2.7
> (config: ./configure --prefix=/home/openmpi-1.2.7/ \
>                       --enable-mpirun-prefix-by-default \
>                       --enable-mpi-threads --with-threads \
>                       --with-loadleveler=/opt/ibmll/LoadL/full/
>                       --with-openib=/usr/lib64/
> )
>
> OFED 1.3 (Cisco_OFED-1.3-fcs.sles10.iso)
>
> The configuration of the SGE is the following:
>
> ====================================================================== 
> =========
> frontend$> qconf -sconf
> #global:
> execd_spool_dir              /opt/sge6_2/default/spool
> mailer                       /bin/mail
> xterm                        /usr/bin/X11/xterm
> load_sensor                  none
> prolog                       /opt/sge6_2/default/common/prolog.sh
> epilog                       none
> shell_start_mode             posix_compliant
> login_shells                 sh,ksh,csh,tcsh
> min_uid                      0
> min_gid                      0
> user_lists                   none
> xuser_lists                  none
> projects                     none
> xprojects                    none
> enforce_project              false
> enforce_user                 auto
> load_report_time             00:00:40
> max_unheard                  00:05:00
> reschedule_unknown           00:00:00
> loglevel                     log_warning

2)

log_level log_info

might show more.


> administrator_mail           sgeadmin at star.star
> set_token_cmd                none
> pag_cmd                      none
> token_extend_time            none
> shepherd_cmd                 none
> qmaster_params               none
> execd_params                 H_MEMORYLOCKED=unlimited
> reporting_params             accounting=true reporting=false \
>                               flush_time=00:00:15 joblog=false
> sharelog=00:00:00
> finished_jobs                100
> gid_range                    20000-20004
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin

3)

You can also try the classic startup:

qlogin_command               /usr/bin/telnet
qlogin_daemon                /usr/sbin/in.telnetd
rlogin_command               /usr/sge/utilbin/lx24-amd64/rlogin
rlogin_daemon                /usr/sbin/in.rlogind
rsh_command                  /usr/sge/utilbin/lx24-amd64/rsh
rsh_daemon                   /usr/sge/utilbin/lx24-amd64/rshd -l

> max_aj_instances             2000
> max_aj_tasks                 75000
> max_u_jobs                   0
> max_jobs                     0
> max_advance_reservations     0
> auto_user_oticket            0
> auto_user_fshare             100
> auto_user_default_project    none
> auto_user_delete_time        86400
> delegated_file_staging       false
> reprioritize                 0
> <snip>
>
> ====================================================================== 
> =========
>
> (all queues are same - different only in h_rt, seq_no, user_lists)
> frontend$> qconf -sq vip.q
> qname                 vip.q
> hostlist              @allhosts
> seq_no                5
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH
> ckpt_list             NONE
> pe_list               ompi
> rerun                 FALSE
> slots
> 2,[node-012.star=4],[node-008.star=4],[node-003.star=4], \
>                        [node-014.star=4],[node-010.star=4], 
> [node-005.star=4]
> tmpdir                /tmp
> shell                 /bin/sh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant

4) Often it's better to have:

shell_start_mode unix_behavior

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93500

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list