[GE users] MPICH2 Hydra integration with SGE

reuti reuti at staff.uni-marburg.de
Fri Aug 6 16:21:42 BST 2010


I just looked into the Hydra startup in MPICH2. For a tight integration into SGE it looks like the default MPICH integration can be reused, with the small change to have:

job_is_first_task  FALSE

as MPICH2 with Hydra will also make a local "ssh/rsh" call to start the first daemon on the master node of the submitted parallel job. As by default the absolute path "/usr/bin/ssh" to "ssh" is complied in (same for "rsh"), it's necessary in the jobscript to have:

mpiexec -bootstrap rsh -bootstrap-exec rsh -machinefile $TMPDIR/machines ./mpihello

to have a call to a plain "rsh" *), so that SGE's "-catch_rsh" will do all the rest automatically.

*) Note: the final communication method is setup solely in SGE, which can be "builtin", "classic rsh" or also "ssh" (according to the Howto at http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html). From the point of view of the application, it's also possible to instruct it to call "fubar" to reach another node. In SGE it would be necessary in start_proc_args to create a link in $TMPDIR which is named "fubar" and point to SGE's rsh-wrapper. Only inside the rsh-wrapper, the `qrsh -inherit ...` will use the method which is setup in SGE to reach another node in the end.


I will add this also to the MPICH2 Howto.

-- Reuti

PS: Sorry for crossposting, but I think not all SGE users who are using MPICH2 are also following the MPICH list.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list