[GE users] Still problems submitting mpich jobs - wrong hosts

Reuti reuti at staff.uni-marburg.de
Fri Jul 6 11:19:16 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Am 06.07.2007 um 11:24 schrieb Gerolf Ziegenhain:

> To sum it up once again: I want to start mpich-jobs on my SGE. On  
> each node there should be exatcly two jobs running. How can I  
> achieve this?

You mean: it is still not working. although you patched the creation  
of the machinefile in startmpi.sh? - Reuti

> My script looks like this:
> #$ -pe mpich 8
> #$ -S /bin/zsh
> #$ -r n
> #$ -cwd
> MPIRUN="/opt/mpich/bin/mpirun"
> ${MPIRUN} -v -machinefile $TMPDIR/machines -np $NSLOTS PROGRAM
>
> The parallel environment is
> qconf -sp mpich
> pe_name           mpich
> slots             72
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /opt/N1GE/mpi/startmpi.sh -catch_rsh $pe_hostfile
> stop_proc_args    /opt/N1GE/mpi/stopmpi.sh
> allocation_rule   2
> control_slaves    TRUE
> job_is_first_task TRUE
> urgency_slots     min
>
> The queue is
> qconf -sq q_mpich
> qname                 q_mpich
> hostlist              @s_hosts
> seq_no                21,[@b_hosts=22],[@x_hosts=23]
> load_thresholds       np_load_avg=1,np_load_short=1,n_slots=2, \
>                        
> [@b_hosts=np_load_avg=1,np_load_short=1,n_slots=2], \
>                        
> [@x_hosts=np_load_avg=1,np_load_short=1,n_slots=2]
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH
> ckpt_list             NONE
> pe_list               mpich mpich2
> rerun                 TRUE
> slots                 2
> tmpdir                /tmp
> shell                 /bin/bash
> prolog                NONE
> epilog                NONE
> shell_start_mode      unix_behavior
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            ziegen,[@x_hosts=big]
> xuser_lists           matlab matlab1 thor
> subordinate_list      NONE
> complex_values        synchron=0,virtual_free=3G,n_slots=2, \
>                        
> [@b_hosts=synchron=0,virtual_free=5G,n_slots=2], \
>                       [@x_hosts=synchron=0,virtual_free=17G,n_slots=2]
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 100:00:00
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                2G,[@b_hosts=4G],[@x_hosts=16G]
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                3G,[@b_hosts=5G],[@x_hosts=17G]
>
>
>
> /BR: Gerolf
>
> -- 
> Dipl. Phys. Gerolf Ziegenhain
> Office: Room 46-332 - Erwin-Schrödinger-Str.46 - TU Kaiserslautern  
> - Germany
> Web: gerolf.ziegenhain.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list