[GE users] Basic steps Mpich & SGE loose or tight integration

Marcel Mohr marcel at physik.TU-Berlin.DE
Thu Aug 24 19:10:05 BST 2006


Dear SGE Users

i'm having SGE 6.0 and Mpich 2.0 on a little beowulf cluster.
Both work well seperatly ;-)

First of all, I guess mpich with mpd doesn't work well.

So I need the smpd version, which i also compiled and works well WITHOUT 
SGE.

I downloaded Reutis scripts
http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html
and modified to use rsh instead of ssh in start_mpich2.c

configured my mpich pe:

pe_name           mpich
slots             10
user_lists        NONE
xuser_lists       NONE
start_proc_args   /usr/local/SGE/mpich2_smpd/startmpich2.sh -catch_rsh \
                   $pe_hostfile /usr/local/mpich2-smpd
stop_proc_args    /usr/local/SGE/mpich2_smpd/stopmpich2.sh $pe_hostfile \
                   /usr/local/mpich2-smpd
allocation_rule   $pe_slots
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

and can run jobs
qsub -p 0 -pe mpich 1 mpich2.sh
which work well, but ONLY if I use 1 processor.

If I use 2 or more they wait in cluster queue (I guess forever) with the 
notification
cannot run in PE "mpich" because it only offers 4 slots

Because of thinking an error could occur due to the initial script
I tried to submit them

>cat job.startmpich
#$ -cwd
sh startmpich2.sh -catch_rsh $pe_hostfile /usr/local/mpich2-smpd

>qsub -pe mpich 1 job.startmpich:

and obtain:
startmpich2.sh: check for smpd daemons (1 of 10)
startmpich2.sh: missing smpd on david007
startmpich2.sh: check for smpd daemons (2 of 10)
startmpich2.sh: found running smpd on david007
startmpich2.sh: got all 1 of 1 nodes


so they seem to work.

(Actually I get the following error from the stop script:
/usr/local/SGE/mpich2_smpd/stopmpich2.sh: line 117: 
/home/SGE_spool//david007/active_jobs/1657.1/pe_hostfile/bin/smpd: Not a 
directory )
but that's not too important.

In principle everything works, submittung,  login onto another node, 
starting, however, but only on 1 node.

I would greatly appreciate any suggestions or hints, and thank for your 
attention, when you have read until here

Marcel Mohr



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list