[GE users] SGE+mvapich2 tight integration

reuti reuti at staff.uni-marburg.de
Wed Jul 28 17:04:47 BST 2010


Am 28.07.2010 um 17:33 schrieb soliday:

> I use SGE to submit mvapich2 jobs to our cluster. What I would like is 
> to tightly integrate it so that when I use the qdel command it will find 
> and delete all the processes. Currently I have it setup so that SGE 
> creates a hostfile and then calls mpirun_rsh
> /act/mvapich2-1.5/gnu/bin/mpirun_rsh -rsh -hostfile \\\$TMPDIR/machines 

with the 3 \: this is your jobscript, or any job-script generator?

> -np $mvapich2 MV2_ENABLE_AFFINITY=0 MV2_ON_DEMAND_THRESHOLD=5000 $command


this will just put the value from the `qsub` command there.

> I really like the mpirun_rsh command because I don't have to have an mpd 
> ring already running. We used to do this but a single node going down 
> would always screw up the mpd ring.
> I have built a special version of qdel that will identify all the 
> threads on all the nodes prior to doing a basic qdel. It will then do a 
> manual kill on all the left over PIDs. This works but I would prefer a 
> tight integration. I've been reading up on it and it looked to me like 
> the essential part is to use "qrsh -inherit -V" in place of rsh. So I 

Yep, this is the way to go.

> tried editing src/pm/mpirun/mpirun_rsh.c and 
> src/pm/mpirun/include/mpirun_rsh.h so that it would use qrsh instead of 
> rsh. Unfortunately when I go to launch a program now I get:

The question is here: what call is included in mpirun_rsh.c by default? When it's a plain "rsh", then it could be caught by the rsh-wrapper in your start_proc_args (-catch_rsh) and your source doesn't need to be tampered. You request a PE - right?

Nevertheless, when you changed it now, it should also work of course.

> (gnome-ssh-askpass:20089): Gtk-WARNING **: cannot open display:
> Host key verification failed.
> Error in init phase...wait for cleanup! (1/2 mpispawn connections)
> Failed in initilization phase, cleaned up all the mpispawn!
> So my question is: is it possible to get SGE+mvapich2 tight integration 
> working with the mpirun_rsh launch method?

How is SGE configured to spread slave tasks, i.e. the entries in `qconf -conf` for "rsh_command" and "rsh_daemon"? I assume it's configured to use SSH, but you don't have set up a passphraseless ssh-key. Even better would be, to use hostbased authentication:


-- Reuti

(The mpirun_rsh is special to mvapich2? It seems not to be available in mpich2 on its own.)

> Thanks,
> --Bob Soliday
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=270815
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list