[GE users] mpi process distribution
blueriver at eastday.com
Mon Apr 23 12:28:47 BST 2007
[ The following text is in the "gb2312" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
Thanks for your attentions.
I have set up a PE environment and added it into the default queue all.q. The following is my configuration.
BTW: I always submit the job in qmon interface with root account at /root. Does it cause any problem?
[root at chenliangyu root]# qconf -spl
[root at chenliangyu root]# qconf -sp mpich
start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
[root at chenliangyu root]# qconf -sq all.q
qtype BATCH INTERACTIVE
slots 3,[chenliangyu.WORKGROUP=1],[chenly.WORKGROUP=1], \
======= 2007-04-23 10:00:45 ????????=======
>Did you set up a PE (Parallel Environment) for your parallel jobs?
>And also, you need to add the PE to the queue, see "pe_list":
>On 4/20/07, ??? <blueriver at eastday.com> wrote:
>> Dear all,
>> I can run the paralell job simplely in my test platform(sge + mpich). But there exists some problems.
>> I have three machines, which each has 1 slot. They are in a same queue named all.q. The TMPDIR of all.q is /opt/sge/tmp on NFS system.
>> The total slot of all.q are 3
>> The content mpi_cpi.sh is:
>> $MPIR_HOME/util/mpirun -np 3 -machinefile $TMPDIR/machines $MPIR_HOME/examples/basic/cpi
>> when I qsub a job mpi_cpi.sh with "-pe mpich 1", SGE gives a machine to the job and run the cpi program with three processes in one machine. The machine id picked randomly. The result is correct.
>> When I qsub a job mpi_cpi.sh with "-pe mpich 2", SGE reports the error: Jobs can not run because available slots combined under PE are not in the range of job.
>> So how can I distribute the three processes to three machines?
>> Thanks and best regards.
>> blueriver at eastday.com
= = = = = = = = = = = = = = = = = = = =
????????blueriver at eastday.com
More information about the gridengine-users