[GE users] Mvapich processes not killed on qdel

Brian R. Smith brs at usf.edu
Thu May 10 17:24:17 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Never mind!  The patch was pretty trivial and the original code called 
execl where as the patch changed it to call execlp.  It is attached, 
just so I don't keep talking about a phantom patch.  Its for version 
mvapich-0.9.5.117.  I can't for the life of me remember where I found 
this...  It will probably look familiar to someone here.

-Brian



Brian R. Smith wrote:
> Reuti,
>
> I haven't taken the time to really look at the code so I don't know 
> why it works the way it does, but I know it calls execlp. Also, the 
> patch for the beta appeared rather non-trivial (if we consider this to 
> be an easy-to-solve problem) so I assume there is more to the problem 
> than just the PATH issue.
>
> -Brian
>
> Reuti wrote:
>> Am 10.05.2007 um 16:40 schrieb Brian R. Smith:
>>
>>> Reuti & Mike,
>>>
>>> I dealt with mvapich and SGE-tight integration (and hence abandoned 
>>> mvapich in favor of OpenMPI, which has wonderful SGE integration 
>>> support and a much cooler plugin-based framework, IMHO). The 
>>> mpirun_rsh command is actually a piece of C code where path values 
>>> for rsh and ssh are hard-coded into the program. Because this code 
>>> seems to blow away at least the PATH variable during execution, the 
>>> exec() call
>>
>> If the problem is just the execl() in the source, one could try 
>> execlp() with a plain rsh as it will honor the set path.
>>
>> -- Reuti
>>
>>> to just "rsh" will fail since no paths will be defined (and hence 
>>> attempts to set PATH in $SGE_ROOT/mpi/startmpi.sh will fail). There 
>>> was a patch floating around for a previous beta release, but it will 
>>> not apply to the the current release cleanly. The file in question 
>>> in version 0.9.8 is
>>>
>>> mpid/ch_gen2/process/mpirun_rsh.c
>>>
>>> Beginning on line 130, I believe, you will see
>>>
>>> #define RSH_CMD "/usr/bin/rsh"
>>> #define SSH_CMD "/usr/bin/ssh"
>>>
>>> I looked around for fixes (as I said, you cannot just change these 
>>> to "rsh" or "ssh", it will fail) but as of a couple weeks ago, no 
>>> one seems to have resolved this. I hope this helps.
>>>
>>> -Brian
>>>
>>>
>>>
>>> Reuti wrote:
>>>> Hi,
>>>>
>>>> Am 09.05.2007 um 21:53 schrieb Mike Hanby:
>>>>
>>>>> I created a simple helloworld job that prints a message and then 
>>>>> sleeps
>>>>> for 5 minutes. If I qdel the job after 1 minute, the job is 
>>>>> removed from
>>>>> the queue but remains running on the nodes for 4 more minutes. I'm 
>>>>> using
>>>>> rsh in this example I have the ps info below:
>>>>
>>>> but still the processes are not children of the 
>>>> sge_execd/sge_shepherd. So the rsh-wrapper isn't used. Is the path 
>>>> to the rsh binary hardcoded somewhere in your MPI scripts? There is 
>>>> /usr/bin/rsh mentioned - can you change it somewhere to read just 
>>>> rsh, so that the rsh-wrapper will be accessed instead of the binary?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> I submitted the job using the following job script:
>>>>> #!/bin/bash
>>>>> #$ -S /bin/bash
>>>>> #$ -cwd
>>>>> #$ -N TestMVAPICH
>>>>> #$ -pe mvapich 4
>>>>> #$ -v MPIR_HOME=/usr/local/topspin/mpi/mpich
>>>>> #$ -v MPICH_PROCESS_GROUP=no
>>>>> #$ -V
>>>>> export MPI_HOME=/usr/local/topspin/mpi/mpich
>>>>> export
>>>>> LD_LIBRARY_PATH=/usr/local/topspin/lib64:$MPI_HOME/lib64:$LD_LIBRARY_PAT 
>>>>>
>>>>> H
>>>>> export PATH=$TMPDIR:$MPI_HOME/bin:$PATH
>>>>> MPIRUN=${MPI_HOME}/bin/mpirun_rsh
>>>>> $MPIRUN -rsh -np $NSLOTS -machinefile $TMPDIR/machines 
>>>>> ./hello-mvapich
>>>>>
>>>>> This is the ps output on the node while the job is running in the 
>>>>> queue:
>>>>> $ ssh compute-0-7 "ps -e f -o pid,ppid,pgrp,command|grep 
>>>>> myuser|grep -v
>>>>> grep"
>>>>> 1460 3611 1460 \_ sshd: myuser [priv]
>>>>> 1464 1460 1460 \_ sshd: myuser at notty
>>>>> 951 947 951 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 954 948 954 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 955 949 955 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 966 950 966 \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 943 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 944 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 945 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 946 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>>
>>>>> And the ps after I qdel the job
>>>>> $ ssh compute-0-7 "ps -e f -o pid,ppid,pgrp,command|grep 
>>>>> myuser|grep -v
>>>>> grep"
>>>>> 1735 3611 1735 \_ sshd: myuser [priv]
>>>>> 1739 1735 1735 \_ sshd: myuser at notty
>>>>> 951 947 951 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 954 948 954 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 955 949 955 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>> 966 950 966 \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>>> MPIRUN_PORT=32826
>>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mike Hanby [mailto:mhanby at uab.edu]
>>>>> Sent: Wednesday, May 09, 2007 11:59
>>>>> To: users at gridengine.sunsource.net
>>>>> Subject: RE: [GE users] Mvapich processes not killed on qdel
>>>>>
>>>>> Hmm, I changed the mpirun command to mpirun_rsh -rsh and submitted 
>>>>> the
>>>>> job, it started and failed with a bunch of connections refused. By
>>>>> default Rocks disables RSH.
>>>>>
>>>>> Does tight integration only work with rsh? If so, I'll see if I 
>>>>> can get
>>>>> that enabled and try again.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>>> Sent: Wednesday, May 09, 2007 11:27
>>>>> To: users at gridengine.sunsource.net
>>>>> Subject: Re: [GE users] Mvapich processes not killed on qdel
>>>>>
>>>>> Hi,
>>>>>
>>>>> can you please post the processtree (master and slave) of a running
>>>>> job on a node by using the ps command:
>>>>>
>>>>> ps -e f -o pid,ppid,pgrp,command
>>>>>
>>>>> Are you sure, that the SGE rsh-wrapper is used, as you mentioned
>>>>> mpirun_ssh?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>> Am 09.05.2007 um 17:43 schrieb Mike Hanby:
>>>>>
>>>>>> Howdy,
>>>>>>
>>>>>> I have GE 6.0u8 on a Rocks 4.2.1 cluster with Infiniband and the
>>>>>> Topspin roll (which includes mvapich).
>>>>>>
>>>>>>
>>>>>>
>>>>>> When I qdel an mvapich job, the job immediately is removed from the
>>>>>> queue, however most of the processes on the nodes do not get
>>>>>> killed. It appears that the mpirun_ssh process does get killed,
>>>>>> however all of the actual job executables (sander.MPI) doesn't.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I followed the directions for tight integration of Mvapich
>>>>>>
>>>>>> http://gridengine.sunsource.net/project/gridengine/howto/mvapich/
>>>>>> MVAPICH_Integration.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> The job runs fine, but again it doesn't kill off processes when
>>>>>> qdel'd.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here's the pe:
>>>>>>
>>>>>> $ qconf -sp mvapich
>>>>>>
>>>>>> pe_name mvapich
>>>>>>
>>>>>> slots 9999
>>>>>>
>>>>>> user_lists NONE
>>>>>>
>>>>>> xuser_lists NONE
>>>>>>
>>>>>> start_proc_args /share/apps/gridengine/mvapich/startmpi.sh -
>>>>>> catch_rsh \
>>>>>>
>>>>>> $pe_hostfile
>>>>>>
>>>>>> stop_proc_args /share/apps/gridengine/mvapich/stopmpi.sh
>>>>>>
>>>>>> allocation_rule $round_robin
>>>>>>
>>>>>> control_slaves TRUE
>>>>>>
>>>>>> job_is_first_task FALSE
>>>>>>
>>>>>> urgency_slots min
>>>>>>
>>>>>>
>>>>>>
>>>>>> The only modifications made to the startmpi.sh script was to change
>>>>>> the location of the hostname and rsh scripts from $SGE_ROOT to /
>>>>>> share/apps/gridengine/mvapich
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any suggestions on what I should look for?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks, MIke
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> -- 
>>> --------------------------------------------------------
>>> + Brian R. Smith +
>>> + HPC Systems Analyst & Programmer +
>>> + Research Computing, University of South Florida +
>>> + 4202 E. Fowler Ave. LIB618 +
>>> + Office Phone: 1 (813) 974-1467 +
>>> + Mobile Phone: 1 (813) 230-3441 +
>>> + Organization URL: http://rc.usf.edu +
>>> --------------------------------------------------------
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


-- 
--------------------------------------------------------
+ Brian R. Smith                                       +
+ HPC Systems Analyst & Programmer                     +
+ Research Computing, University of South Florida      +
+ 4202 E. Fowler Ave. LIB618                           +
+ Office Phone: 1 (813) 974-1467                       +
+ Mobile Phone: 1 (813) 230-3441                       +
+ Organization URL: http://rc.usf.edu                  +
--------------------------------------------------------



    [ Part 2: "Attached Text" ]

diff -u -d -r mvapich-0.9.5.117.ori/mpid/vapi/process/mpirun_rsh.c mvapich-0.9.5.117/mpid/vapi/process/mpirun_rsh.c
--- mvapich-0.9.5.117.ori/mpid/vapi/process/mpirun_rsh.c	2005-06-18 21:10:40.000000000 -0700
+++ mvapich-0.9.5.117/mpid/vapi/process/mpirun_rsh.c	2005-07-14 16:53:09.000000000 -0700
@@ -126,14 +126,17 @@
 void process_termination(void);
 int set_fds(fd_set * rfds, fd_set * efds);
 
-#define RSH_CMD	"/usr/bin/rsh"
-#define SSH_CMD	"/usr/bin/ssh"
+#define RSH_CMD	"rsh"
+#define SSH_CMD	"ssh"
 #ifdef USE_SSH
 int use_rsh = 0;
 #else
 int use_rsh = 1;
 #endif
 
+int use_wd = 1;
+
+#define RSH_ARG ""
 #define SSH_ARG "-q"
 
 #define SH_NAME_LEN	(128)
@@ -155,6 +158,7 @@
     {"help", no_argument, 0, 0},
     {"v", no_argument, 0, 0},
     {"tv", no_argument, 0, 0},
+    {"nowd", no_argument, 0, 0},
     {0, 0, 0, 0}
 };
 
@@ -285,6 +289,9 @@
  		}	
   		break;
             case 11:
+                use_wd = 0;
+                break;
+            case 12:
                 usage();
                 exit(0);
                 break;
@@ -745,7 +752,7 @@
     int str_len;
 
     str_len = strlen(command_name) + strlen(env) + strlen(wd) + 
-        strlen(mpirun_processes) + 512;
+        strlen(mpirun_processes) + strlen(plist[i].hostname) + 512;
 
     if ((remote_command = malloc(str_len)) == NULL) {
         fprintf(stderr, "Failed to malloc %d bytes for remote_command\n",
@@ -763,12 +770,30 @@
      * this is the remote command we execute whether we were are using 
      * an xterm or using rsh directly 
      */
-    sprintf(remote_command, 
-	    "cd %s; %s MPIRUN_MPD=0 MPIRUN_HOST=%s MPIRUN_PORT=%d "
-	    "MPIRUN_PROCESSES='%s' " 
-	    "MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s %s",
-	    wd, ENV_CMD, mpirun_host, port, mpirun_processes, i, 
-	    nprocs, id, display,env, command_name); 
+    if (use_rsh) {
+      if (use_wd) {
+	sprintf(remote_command, 
+		"cd %s; %s MPIRUN_MPD=0 MPIRUN_HOST=%s MPIRUN_PORT=%d "
+		"MPIRUN_PROCESSES='%s' " 
+		"MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s %s",
+		wd, ENV_CMD, mpirun_host, port, mpirun_processes, i, 
+		nprocs, id, display,env, command_name); 
+      } else {
+	sprintf(remote_command, 
+		"%s MPIRUN_MPD=0 MPIRUN_HOST=%s MPIRUN_PORT=%d "
+		"MPIRUN_PROCESSES='%s' " 
+		"MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s %s",
+		ENV_CMD, mpirun_host, port, mpirun_processes, i, 
+		nprocs, id, display,env, command_name); 
+      }
+    } else {
+      sprintf(remote_command, 
+	      "cd %s; %s MPIRUN_MPD=0 MPIRUN_HOST=%s MPIRUN_PORT=%d "
+	      "MPIRUN_PROCESSES='%s' " 
+	      "MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s %s",
+	      wd, ENV_CMD, mpirun_host, port, mpirun_processes, i, 
+	      nprocs, id, display,env, command_name); 
+    }
 
     if (xterm_on) {
         sprintf(xterm_command, "%s; echo process exited", remote_command);
@@ -790,12 +815,13 @@
         if (xterm_on) {
             if (show_on) {
                 printf("command: %s -T %s -e %s %s %s %s\n", XTERM,
-                       xterm_title, sh_cmd, use_rsh ? "" : SSH_ARG,
+                       xterm_title, sh_cmd, use_rsh ? RSH_ARG : SSH_ARG,
                        plist[i].hostname, xterm_command);
             } else {
                 if (use_rsh) {
                     execl(XTERM, XTERM, "-T", xterm_title, "-e",
-                          sh_cmd, plist[i].hostname, xterm_command, NULL);
+                          sh_cmd, RSH_ARG, plist[i].hostname, 
+			  xterm_command, NULL);
                 } else {
                     execl(XTERM, XTERM, "-T", xterm_title, "-e",
                           sh_cmd, SSH_ARG, plist[i].hostname,
@@ -808,10 +834,10 @@
                        remote_command);
             } else {
                 if (use_rsh) {
-                    execl(sh_cmd, sh_cmd, plist[i].hostname,
-                          remote_command, NULL);
+                    execlp(sh_cmd, " ", RSH_ARG, plist[i].hostname, 
+			   remote_command, NULL);
                 } else {
-                    execl(sh_cmd, sh_cmd, SSH_ARG, plist[i].hostname,
+                    execlp(sh_cmd, " ", SSH_ARG, plist[i].hostname,
                           remote_command, NULL);
                 }
             }
@@ -933,7 +959,7 @@
 
 void usage(void)
 {
-    fprintf(stderr, "usage: mpirun_rsh [-v] [-rsh|-ssh] "
+    fprintf(stderr, "usage: mpirun_rsh [-v] [-rsh|-ssh] [-nowd]"
             "[-paramfile=pfile] "
   	    "[-debug] -[tv] [-xterm] [-show] -np N "
             "(-hostfile hfile | h1 h2 ... hN) a.out args\n");
@@ -941,6 +967,7 @@
     fprintf(stderr, "\tv          => Show version and exit\n");
     fprintf(stderr, "\trsh        => " "to use rsh for connecting\n");
     fprintf(stderr, "\tssh        => " "to use ssh for connecting\n");
+    fprintf(stderr, "\tnowd       => " "do not 'cd $wd' with rsh\n");
     fprintf(stderr, "\tparamfile  => "
             "file containing run-time MVICH parameters\n");
     fprintf(stderr, "\tdebug      => "
diff -u -d -r mvapich-0.9.5.117.ori/mpid/vapi_multirail/process/mpirun_rsh.c mvapich-0.9.5.117/mpid/vapi_multirail/process/mpirun_rsh.c
--- mvapich-0.9.5.117.ori/mpid/vapi_multirail/process/mpirun_rsh.c	2005-06-18 21:10:40.000000000 -0700
+++ mvapich-0.9.5.117/mpid/vapi_multirail/process/mpirun_rsh.c	2005-07-14 16:51:48.000000000 -0700
@@ -126,14 +126,17 @@
 void process_termination(void);
 int set_fds(fd_set * rfds, fd_set * efds);
 
-#define RSH_CMD	"/usr/bin/rsh"
-#define SSH_CMD	"/usr/bin/ssh"
+#define RSH_CMD	"rsh"
+#define SSH_CMD	"ssh"
 #ifdef USE_SSH
 int use_rsh = 0;
 #else
 int use_rsh = 1;
 #endif
 
+int use_wd = 1;
+
+#define RSH_ARG ""
 #define SSH_ARG "-q"
 
 #define SH_NAME_LEN	(128)
@@ -156,6 +159,7 @@
     {"ssh", no_argument, 0, 0},
     {"help", no_argument, 0, 0},
     {"v", no_argument, 0, 0},
+    {"nowd", no_argument, 0, 0},
     {0, 0, 0, 0}
 };
 
@@ -246,6 +250,9 @@
                 exit(0);
                 break;
             case 10:
+                use_wd = 0;
+                break;
+            case 11:
                 usage();
                 exit(0);
                 break;
@@ -612,10 +619,24 @@
      * this is the remote command we execute whether we were are using 
      * an xterm or using rsh directly 
      */
-    sprintf(remote_command, "cd %s; %s MPIRUN_HOST=%s MPIRUN_PORT=%d "
-            "MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s",
-            wd, ENV_CMD, mpirun_host, port, i, nprocs, id, env,
-            command_name);
+    if (use_rsh) {
+      if (use_wd) {
+	sprintf(remote_command, "cd %s; %s MPIRUN_HOST=%s MPIRUN_PORT=%d "
+		"MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s",
+		wd, ENV_CMD, mpirun_host, port, i, nprocs, id, env,
+		command_name);
+      } else {
+	sprintf(remote_command, "%s MPIRUN_HOST=%s MPIRUN_PORT=%d "
+		"MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s",
+		ENV_CMD, mpirun_host, port, i, nprocs, id, env,
+		command_name);
+      }
+    } else {
+      sprintf(remote_command, "cd %s; %s MPIRUN_HOST=%s MPIRUN_PORT=%d "
+	      "MPIRUN_RANK=%d MPIRUN_NPROCS=%d MPIRUN_ID=%d %s %s",
+	      wd, ENV_CMD, mpirun_host, port, i, nprocs, id, env,
+	      command_name);
+    }
 
 
     if (xterm_on) {
@@ -638,12 +659,13 @@
         if (xterm_on) {
             if (show_on) {
                 printf("command: %s -T %s -e %s %s %s %s\n", XTERM,
-                       xterm_title, sh_cmd, use_rsh ? "" : SSH_ARG,
+                       xterm_title, sh_cmd, use_rsh ? RSH_ARG : SSH_ARG,
                        plist[i].hostname, xterm_command);
             } else {
                 if (use_rsh) {
                     execl(XTERM, XTERM, "-T", xterm_title, "-e",
-                          sh_cmd, plist[i].hostname, xterm_command, NULL);
+                          sh_cmd, RSH_ARG, plist[i].hostname, 
+			  xterm_command, NULL);
                 } else {
                     execl(XTERM, XTERM, "-T", xterm_title, "-e",
                           sh_cmd, SSH_ARG, plist[i].hostname,
@@ -656,10 +678,10 @@
                        remote_command);
             } else {
                 if (use_rsh) {
-                    execl(sh_cmd, sh_cmd, plist[i].hostname,
+                    execlp(sh_cmd, " ", RSH_ARG, plist[i].hostname,
                           remote_command, NULL);
                 } else {
-                    execl(sh_cmd, sh_cmd, SSH_ARG, plist[i].hostname,
+                    execlp(sh_cmd, " ", SSH_ARG, plist[i].hostname,
                           remote_command, NULL);
                 }
             }
@@ -780,7 +802,7 @@
 
 void usage(void)
 {
-    fprintf(stderr, "usage: mpirun_rsh [-v] [-rsh|-ssh] "
+    fprintf(stderr, "usage: mpirun_rsh [-v] [-rsh|-ssh] [-nowd]"
             "[-paramfile=pfile] "
             "[-debug] [-xterm] [-show] -np N "
             "(-hostfile hfile | h1 h2 ... hN) a.out args\n");
@@ -788,6 +810,7 @@
     fprintf(stderr, "\tv          => Show version and exit\n");
     fprintf(stderr, "\trsh        => " "to use rsh for connecting\n");
     fprintf(stderr, "\tssh        => " "to use ssh for connecting\n");
+    fprintf(stderr, "\tnowd       => " "do not 'cd $wd' with rsh\n");
     fprintf(stderr, "\tparamfile  => "
             "file containing run-time MVICH parameters\n");
     fprintf(stderr, "\tdebug      => "



    [ Part 3: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list