[GE users] Mvapich processes not killed on qdel

Reuti reuti at staff.uni-marburg.de
Thu May 10 16:18:48 BST 2007


Am 10.05.2007 um 16:40 schrieb Brian R. Smith:

> Reuti & Mike,
>
> I dealt with mvapich and SGE-tight integration (and hence abandoned  
> mvapich in favor of OpenMPI, which has wonderful SGE integration  
> support and a much cooler plugin-based framework, IMHO).  The  
> mpirun_rsh command is actually a piece of C code where path values  
> for rsh and ssh are hard-coded into the program.  Because this code  
> seems to blow away at least the PATH variable during execution, the  
> exec() call

If the problem is just the execl() in the source, one could try execlp 
() with a plain rsh as it will honor the set path.

-- Reuti

> to just "rsh" will fail since no paths will be defined (and hence  
> attempts to set PATH in $SGE_ROOT/mpi/startmpi.sh will fail).   
> There was a patch floating around for a previous beta release, but  
> it will not apply to the the current release cleanly.  The file in  
> question in version 0.9.8 is
>
> mpid/ch_gen2/process/mpirun_rsh.c
>
> Beginning on line 130, I believe, you will see
>
> #define RSH_CMD "/usr/bin/rsh"
> #define SSH_CMD "/usr/bin/ssh"
>
> I looked around for fixes (as I said, you cannot just change these  
> to "rsh" or "ssh", it will fail) but as of a couple weeks ago, no  
> one seems to have resolved this.  I hope this helps.
>
> -Brian
>
>
>
> Reuti wrote:
>> Hi,
>>
>> Am 09.05.2007 um 21:53 schrieb Mike Hanby:
>>
>>> I created a simple helloworld job that prints a message and then  
>>> sleeps
>>> for 5 minutes. If I qdel the job after 1 minute, the job is  
>>> removed from
>>> the queue but remains running on the nodes for 4 more minutes.  
>>> I'm using
>>> rsh in this example I have the ps info below:
>>
>> but still the processes are not children of the sge_execd/ 
>> sge_shepherd. So the rsh-wrapper isn't used. Is the path to the  
>> rsh binary hardcoded somewhere in your MPI scripts? There is /usr/ 
>> bin/rsh mentioned - can you change it somewhere to read just rsh,  
>> so that the rsh-wrapper will be accessed instead of the binary?
>>
>> -- Reuti
>>
>>
>>> I submitted the job using the following job script:
>>> #!/bin/bash
>>> #$ -S /bin/bash
>>> #$ -cwd
>>> #$ -N TestMVAPICH
>>> #$ -pe mvapich 4
>>> #$ -v MPIR_HOME=/usr/local/topspin/mpi/mpich
>>> #$ -v MPICH_PROCESS_GROUP=no
>>> #$ -V
>>> export MPI_HOME=/usr/local/topspin/mpi/mpich
>>> export
>>> LD_LIBRARY_PATH=/usr/local/topspin/lib64:$MPI_HOME/lib64: 
>>> $LD_LIBRARY_PAT
>>> H
>>> export PATH=$TMPDIR:$MPI_HOME/bin:$PATH
>>> MPIRUN=${MPI_HOME}/bin/mpirun_rsh
>>> $MPIRUN -rsh -np $NSLOTS -machinefile $TMPDIR/machines ./hello- 
>>> mvapich
>>>
>>> This is the ps output on the node while the job is running in the  
>>> queue:
>>> $ ssh compute-0-7 "ps -e f -o pid,ppid,pgrp,command|grep myuser| 
>>> grep -v
>>> grep"
>>>  1460  3611  1460  \_ sshd: myuser [priv]
>>>  1464  1460  1460      \_ sshd: myuser at notty
>>>   951   947   951  |   \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   954   948   954  |   \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   955   949   955  |   \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   966   950   966      \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   943   942   938              \_ /usr/bin/rsh compute-0-7 cd
>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   944   942   938              \_ /usr/bin/rsh compute-0-7 cd
>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   945   942   938              \_ /usr/bin/rsh compute-0-7 cd
>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   946   942   938              \_ /usr/bin/rsh compute-0-7 cd
>>> /home/myuser/pmemdTest-mvapich; /usr/bin/env MPIRUN_MPD=0
>>> MPIRUN_HOST=compute-0-7.local MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>
>>> And the ps after I qdel the job
>>> $ ssh compute-0-7 "ps -e f -o pid,ppid,pgrp,command|grep myuser| 
>>> grep -v
>>> grep"
>>>  1735  3611  1735  \_ sshd: myuser [priv]
>>>  1739  1735  1735      \_ sshd: myuser at notty
>>>   951   947   951  |   \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=0 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   954   948   954  |   \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=1 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   955   949   955  |   \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=2 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>   966   950   966      \_ bash -c cd /home/myuser/pmemdTest-mvapich;
>>> /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=compute-0-7.local
>>> MPIRUN_PORT=32826
>>> MPIRUN_PROCESSES='compute-0-7:compute-0-7:compute-0-7:compute-0-7:'
>>> MPIRUN_RANK=3 MPIRUN_NPROCS=4 MPIRUN_ID=942      ./hello-mvapich
>>>
>>> -----Original Message-----
>>> From: Mike Hanby [mailto:mhanby at uab.edu]
>>> Sent: Wednesday, May 09, 2007 11:59
>>> To: users at gridengine.sunsource.net
>>> Subject: RE: [GE users] Mvapich processes not killed on qdel
>>>
>>> Hmm, I changed the mpirun command to mpirun_rsh -rsh and  
>>> submitted the
>>> job, it started and failed with a bunch of connections refused. By
>>> default Rocks disables RSH.
>>>
>>> Does tight integration only work with rsh? If so, I'll see if I  
>>> can get
>>> that enabled and try again.
>>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: Wednesday, May 09, 2007 11:27
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Mvapich processes not killed on qdel
>>>
>>> Hi,
>>>
>>> can you please post the processtree (master and slave) of a running
>>> job on a node by using the ps command:
>>>
>>> ps -e f -o pid,ppid,pgrp,command
>>>
>>> Are you sure, that the SGE rsh-wrapper is used, as you mentioned
>>> mpirun_ssh?
>>>
>>> -- Reuti
>>>
>>>
>>> Am 09.05.2007 um 17:43 schrieb Mike Hanby:
>>>
>>>> Howdy,
>>>>
>>>> I have GE 6.0u8 on a Rocks 4.2.1 cluster with Infiniband and the
>>>> Topspin roll (which includes mvapich).
>>>>
>>>>
>>>>
>>>> When I qdel an mvapich job, the job immediately is removed from the
>>>> queue, however most of the processes on the nodes do not get
>>>> killed. It appears that the mpirun_ssh process does get killed,
>>>> however all of the actual job executables (sander.MPI) doesn't.
>>>>
>>>>
>>>>
>>>> I followed the directions for tight integration of Mvapich
>>>>
>>>> http://gridengine.sunsource.net/project/gridengine/howto/mvapich/
>>>> MVAPICH_Integration.html
>>>>
>>>>
>>>>
>>>> The job runs fine, but again it doesn't kill off processes when
>>>> qdel'd.
>>>>
>>>>
>>>>
>>>> Here's the pe:
>>>>
>>>> $ qconf -sp mvapich
>>>>
>>>> pe_name           mvapich
>>>>
>>>> slots             9999
>>>>
>>>> user_lists        NONE
>>>>
>>>> xuser_lists       NONE
>>>>
>>>> start_proc_args   /share/apps/gridengine/mvapich/startmpi.sh -
>>>> catch_rsh \
>>>>
>>>>                   $pe_hostfile
>>>>
>>>> stop_proc_args    /share/apps/gridengine/mvapich/stopmpi.sh
>>>>
>>>> allocation_rule   $round_robin
>>>>
>>>> control_slaves    TRUE
>>>>
>>>> job_is_first_task FALSE
>>>>
>>>> urgency_slots     min
>>>>
>>>>
>>>>
>>>> The only modifications made to the startmpi.sh script was to change
>>>> the location of the hostname and rsh scripts from $SGE_ROOT to /
>>>> share/apps/gridengine/mvapich
>>>>
>>>>
>>>>
>>>> Any suggestions on what I should look for?
>>>>
>>>>
>>>>
>>>> Thanks, MIke
>>>>
>>>>
>>>>
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> -- 
> --------------------------------------------------------
> + Brian R. Smith                                       +
> + HPC Systems Analyst & Programmer                     +
> + Research Computing, University of South Florida      +
> + 4202 E. Fowler Ave. LIB618                           +
> + Office Phone: 1 (813) 974-1467                       +
> + Mobile Phone: 1 (813) 230-3441                       +
> + Organization URL: http://rc.usf.edu                  +
> --------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list