[GE users] MPICH2 Hydra integration with SGE
reuti at staff.uni-marburg.de
Fri Aug 6 16:21:42 BST 2010
I just looked into the Hydra startup in MPICH2. For a tight integration into SGE it looks like the default MPICH integration can be reused, with the small change to have:
as MPICH2 with Hydra will also make a local "ssh/rsh" call to start the first daemon on the master node of the submitted parallel job. As by default the absolute path "/usr/bin/ssh" to "ssh" is complied in (same for "rsh"), it's necessary in the jobscript to have:
mpiexec -bootstrap rsh -bootstrap-exec rsh -machinefile $TMPDIR/machines ./mpihello
to have a call to a plain "rsh" *), so that SGE's "-catch_rsh" will do all the rest automatically.
*) Note: the final communication method is setup solely in SGE, which can be "builtin", "classic rsh" or also "ssh" (according to the Howto at http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html). From the point of view of the application, it's also possible to instruct it to call "fubar" to reach another node. In SGE it would be necessary in start_proc_args to create a link in $TMPDIR which is named "fubar" and point to SGE's rsh-wrapper. Only inside the rsh-wrapper, the `qrsh -inherit ...` will use the method which is setup in SGE to reach another node in the end.
I will add this also to the MPICH2 Howto.
PS: Sorry for crossposting, but I think not all SGE users who are using MPICH2 are also following the MPICH list.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users