[GE users] mpi process distribution

??? blueriver at eastday.com
Mon Apr 23 12:28:47 BST 2007

    [ The following text is in the "gb2312" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


   Thanks for your attentions.

   I have set up a PE environment and added it into the default queue all.q. The following is my configuration.

   BTW: I always submit the job in qmon interface with root account at /root. Does it cause any problem? 

   Thank you.

[root at chenliangyu root]# qconf -spl
[root at chenliangyu root]# qconf -sp mpich
pe_name           mpich
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args    /opt/sge/mpi/stopmpi.sh
allocation_rule   $pe_slots
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min
[root at chenliangyu root]# qconf -sq all.q
qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               mpich
rerun                 FALSE
slots                 3,[chenliangyu.WORKGROUP=1],[chenly.WORKGROUP=1], \
tmpdir                /opt/sge/tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


======= 2007-04-23 10:00:45 ????????=======

>Did you set up a PE (Parallel Environment) for your parallel jobs?
>And also, you need to add the PE to the queue, see "pe_list":
>On 4/20/07, ??? <blueriver at eastday.com> wrote:
>> Dear all,
>>    I can run the paralell job simplely in my test platform(sge + mpich). But there exists some problems.
>>    I have three machines, which each has 1 slot. They are in a same queue named all.q. The TMPDIR of all.q is /opt/sge/tmp on NFS system.
>>    The total slot of all.q are 3
>>    The content mpi_cpi.sh is:
>>    $MPIR_HOME/util/mpirun -np 3 -machinefile $TMPDIR/machines $MPIR_HOME/examples/basic/cpi
>>    when I qsub a job mpi_cpi.sh with "-pe mpich 1", SGE gives a machine to the job and run the cpi program with three processes in one machine. The machine id picked randomly. The result is correct.
>>    When I qsub a job mpi_cpi.sh with "-pe mpich 2", SGE reports the error: Jobs can not run because available slots combined under PE are not in the range of job.
>>    So how can I distribute the three processes to three machines?
>>    Thanks and best regards.
>>                 Tom
>> blueriver at eastday.com
>> 2007-04-21

= = = = = = = = = = = = = = = = = = = =

????????blueriver at eastday.com

More information about the gridengine-users mailing list