[GE users] Mvapich processes not killed on qdel

Brian R. Smith brs at usf.edu
Thu May 10 17:15:12 BST 2007


Reuti,

I haven't taken the time to really look at the code so I don't know why 
it works the way it does, but I know it calls execlp. Also, the patch 
for the beta appeared rather non-trivial (if we consider this to be an 
easy-to-solve problem) so I assume there is more to the problem than 
just the PATH issue.

-Brian

Reuti wrote:
> Am 10.05.2007 um 16:40 schrieb Brian R. Smith:
>
>> Reuti & Mike,
>>
>> I dealt with mvapich and SGE-tight integration (and hence abandoned 
>> mvapich in favor of OpenMPI, which has wonderful SGE integration 
>> support and a much cooler plugin-based framework, IMHO). The 
>> mpirun_rsh command is actually a piece of C code where path values 
>> for rsh and ssh are hard-coded into the program. Because this code 
>> seems to blow away at least the PATH variable during execution, the 
>> exec() call
>
> If the problem is just the execl() in the source, one could try 
> execlp() with a plain rsh as it will honor the set path.
>
> -- Reuti
>
>> to just "rsh" will fail since no paths will be defined (and hence 
>> attempts to set PATH in $SGE_ROOT/mpi/startmpi.sh will fail). There 
>> was a patch floating around for a previous beta release, but it will 
>> not apply to the the current release cleanly. The file in question in 
>> version 0.9.8 is
>>
>> mpid/ch_gen2/process/mpirun_rsh.c
>>
>> Beginning on line 130, I believe, you will see
>>
>> #define RSH_CMD "/usr/bin/rsh"
>> #define SSH_CMD "/usr/bin/ssh"
>>
>> I looked around for fixes (as I said, you cannot just change these to 
>> "rsh" or "ssh", it will fail) but as of a couple weeks ago, no one 
>> seems to have resolved this. I hope this helps.
>>
>> -Brian
>>
>>
>>
>> Reuti wrote:
>>> Hi,
>>>
>>> Am 09.05.2007 um 21:53 schrieb Mike Hanby:
>>>
>>>> I created a simple helloworld job that prints a message and then 
>>>> sleeps
>>>> for 5 minutes. If I qdel the job after 1 minute, the job is removed 
>>>> from
>>>> the queue but remains running on the nodes for 4 more minutes. I'm 
>>>> using
>>>> rsh in this example I have the ps info below:
>>>
>>> but still the processes are not children of the 
>>> sge_execd/sge_shepherd. So the rsh-wrapper isn't used. Is the path 
>>> to the rsh binary hardcoded somewhere in your MPI scripts? There is 
>>> /usr/bin/rsh mentioned - can you change it somewhere to read just 
>>> rsh, so that the rsh-wrapper will be accessed instead of the binary?
>>>
>>> -- Reuti
>>>
>>>
>>>> I submitted the job using the following job script:
>>>> #!/bin/bash
>>>> #$ -S /bin/bash
>>>> #$ -cwd
>>>> #$ -N TestMVAPICH
>>>> #$ -pe mvapich 4
>>>> #$ -v MPIR_HOME=/usr/local/topspin/mpi/mpich
>>>> #$ -v MPICH_PROCESS_GROUP=no
>>>> #$ -V
>>>> export MPI_HOME=/usr/local/topspin/mpi/mpich
>>>> export
>>>> LD_LIBRARY_PATH=/usr/local/topspin/lib64:$MPI_HOME/lib64:$LD_LIBRARY_PAT 
>>>>
>>>> H
>>>> export PATH=$TMPDIR:$MPI_HOME/bin:$PATH
>>>> MPIRUN=${MPI_HOME}/bin/mpirun_rsh
>>>> $MPIRUN -rsh -np $NSLOTS -machinefile $TMPDIR/machines ./hello-mvapich
>>>>
>>>> This is the ps output on the node while the job is running in the 
>>>> queue:
>>>> $ ssh compute-0-7 "ps -e f -o pid,ppid,pgrp,command|grep 
>>>> myuser|grep -v
>>>> grep"
>>>> 1460 3611 1460 \_ sshd: myuser [priv]
>>>> 1464 1460 1460 \_ sshd: myuser at notty
>>>> 951 947 951 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 954 948 954 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 955 949 955 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 966 950 966 \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 943 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 944 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 945 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 946 942 938 \_ /usr/bin/rsh compute-0-7 cd
>>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>
>>>> And the ps after I qdel the job
>>>> $ ssh compute-0-7 "ps -e f -o pid,ppid,pgrp,command|grep 
>>>> myuser|grep -v
>>>> grep"
>>>> 1735 3611 1735 \_ sshd: myuser [priv]
>>>> 1739 1735 1735 \_ sshd: myuser at notty
>>>> 951 947 951 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 954 948 954 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 955 949 955 | \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>> 966 950 966 \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>>> MPIRUN_PORT=32826
>>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942 ./hello-mvapich
>>>>
>>>> -----Original Message-----
>>>> From: Mike Hanby [mailto:mhanby at uab.edu]
>>>> Sent: Wednesday, May 09, 2007 11:59
>>>> To: users at gridengine.sunsource.net
>>>> Subject: RE: [GE users] Mvapich processes not killed on qdel
>>>>
>>>> Hmm, I changed the mpirun command to mpirun_rsh -rsh and submitted the
>>>> job, it started and failed with a bunch of connections refused. By
>>>> default Rocks disables RSH.
>>>>
>>>> Does tight integration only work with rsh? If so, I'll see if I can 
>>>> get
>>>> that enabled and try again.
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: Wednesday, May 09, 2007 11:27
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] Mvapich processes not killed on qdel
>>>>
>>>> Hi,
>>>>
>>>> can you please post the processtree (master and slave) of a running
>>>> job on a node by using the ps command:
>>>>
>>>> ps -e f -o pid,ppid,pgrp,command
>>>>
>>>> Are you sure, that the SGE rsh-wrapper is used, as you mentioned
>>>> mpirun_ssh?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>> Am 09.05.2007 um 17:43 schrieb Mike Hanby:
>>>>
>>>>> Howdy,
>>>>>
>>>>> I have GE 6.0u8 on a Rocks 4.2.1 cluster with Infiniband and the
>>>>> Topspin roll (which includes mvapich).
>>>>>
>>>>>
>>>>>
>>>>> When I qdel an mvapich job, the job immediately is removed from the
>>>>> queue, however most of the processes on the nodes do not get
>>>>> killed. It appears that the mpirun_ssh process does get killed,
>>>>> however all of the actual job executables (sander.MPI) doesn't.
>>>>>
>>>>>
>>>>>
>>>>> I followed the directions for tight integration of Mvapich
>>>>>
>>>>> http://gridengine.sunsource.net/project/gridengine/howto/mvapich/
>>>>> MVAPICH_Integration.html
>>>>>
>>>>>
>>>>>
>>>>> The job runs fine, but again it doesn't kill off processes when
>>>>> qdel'd.
>>>>>
>>>>>
>>>>>
>>>>> Here's the pe:
>>>>>
>>>>> $ qconf -sp mvapich
>>>>>
>>>>> pe_name mvapich
>>>>>
>>>>> slots 9999
>>>>>
>>>>> user_lists NONE
>>>>>
>>>>> xuser_lists NONE
>>>>>
>>>>> start_proc_args /share/apps/gridengine/mvapich/startmpi.sh -
>>>>> catch_rsh \
>>>>>
>>>>> $pe_hostfile
>>>>>
>>>>> stop_proc_args /share/apps/gridengine/mvapich/stopmpi.sh
>>>>>
>>>>> allocation_rule $round_robin
>>>>>
>>>>> control_slaves TRUE
>>>>>
>>>>> job_is_first_task FALSE
>>>>>
>>>>> urgency_slots min
>>>>>
>>>>>
>>>>>
>>>>> The only modifications made to the startmpi.sh script was to change
>>>>> the location of the hostname and rsh scripts from $SGE_ROOT to /
>>>>> share/apps/gridengine/mvapich
>>>>>
>>>>>
>>>>>
>>>>> Any suggestions on what I should look for?
>>>>>
>>>>>
>>>>>
>>>>> Thanks, MIke
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> -- 
>> --------------------------------------------------------
>> + Brian R. Smith +
>> + HPC Systems Analyst & Programmer +
>> + Research Computing, University of South Florida +
>> + 4202 E. Fowler Ave. LIB618 +
>> + Office Phone: 1 (813) 974-1467 +
>> + Mobile Phone: 1 (813) 230-3441 +
>> + Organization URL: http://rc.usf.edu +
>> --------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


-- 
--------------------------------------------------------
+ Brian R. Smith                                       +
+ HPC Systems Analyst & Programmer                     +
+ Research Computing, University of South Florida      +
+ 4202 E. Fowler Ave. LIB618                           +
+ Office Phone: 1 (813) 974-1467                       +
+ Mobile Phone: 1 (813) 230-3441                       +
+ Organization URL: http://rc.usf.edu                  +
--------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list