[GE users] SGE+mvapich2 tight integration
soliday at aps.anl.gov
Wed Jul 28 16:33:08 BST 2010
I use SGE to submit mvapich2 jobs to our cluster. What I would like is
to tightly integrate it so that when I use the qdel command it will find
and delete all the processes. Currently I have it setup so that SGE
creates a hostfile and then calls mpirun_rsh
/act/mvapich2-1.5/gnu/bin/mpirun_rsh -rsh -hostfile \\\$TMPDIR/machines
-np $mvapich2 MV2_ENABLE_AFFINITY=0 MV2_ON_DEMAND_THRESHOLD=5000 $command
I really like the mpirun_rsh command because I don't have to have an mpd
ring already running. We used to do this but a single node going down
would always screw up the mpd ring.
I have built a special version of qdel that will identify all the
threads on all the nodes prior to doing a basic qdel. It will then do a
manual kill on all the left over PIDs. This works but I would prefer a
tight integration. I've been reading up on it and it looked to me like
the essential part is to use "qrsh -inherit -V" in place of rsh. So I
tried editing src/pm/mpirun/mpirun_rsh.c and
src/pm/mpirun/include/mpirun_rsh.h so that it would use qrsh instead of
rsh. Unfortunately when I go to launch a program now I get:
(gnome-ssh-askpass:20089): Gtk-WARNING **: cannot open display:
Host key verification failed.
Error in init phase...wait for cleanup! (1/2 mpispawn connections)
Failed in initilization phase, cleaned up all the mpispawn!
So my question is: is it possible to get SGE+mvapich2 tight integration
working with the mpirun_rsh launch method?
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users