[GE users] MPICH2 tight integration
skylar2 at u.washington.edu
Thu Aug 13 15:22:25 BST 2009
> Am 13.08.2009 um 01:44 schrieb skylar2:
>> Has anyone gotten the MPICH2 tight integration working? I'm following
>> the stuff in the wiki:
>> I installed the mpich2_mpd package in $SGE_ROOT and setup my parallel
>> environment like so:
>> pe_name mpich2_mpd
>> slots 999
>> user_lists NONE
>> xuser_lists NONE
>> start_proc_args /net/gs/vol3/software/sge/mpich2_mpd/
>> startmpich2.sh \
>> -catch_rsh $pe_hostfile \
>> stop_proc_args /net/gs/vol3/software/sge/mpich2_mpd/stopmpich2.sh \
>> -catch_rsh \
>> allocation_rule $round_robin
>> control_slaves TRUE
>> job_is_first_task FALSE
>> urgency_slots min
>> When I try to run a job it fails during the mpd boot up:
> was your first test already with such a high number of nodes? Is it
> working with only 4/8/16... slots on 1/2/4 nodes?
> a) Are all nodes the same, or can it be that some have a firewall?
They are all the same, and on a private network so no firewall. I also
verified that I could mpdboot and mpirun outside SGE, and that worked fine.
> b) It may be, that the startup sequence takes just too long and you
> have to increase the "sleep" time in the startmpich2.sh But then the
> startup would be really long.
I doubled the SLEEPTIME and it still didn't work even with only 24
slots, so I don't think that was the issue.
> c) What about using a daemonless smpd startup?
I'm partial to MPIs with daemons, but if it works I won't complain. I'll
see if I can get it wired up.
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S048, (206)-685-7354
-- University of Washington School of Medicine
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
[ Part 2, "OpenPGP digital signature" Application/PGP-SIGNATURE ]
[ (Name: "signature.asc") 261 bytes. ]
[ Unable to print this part. ]
More information about the gridengine-users