[GE users] Qdel problem

Reuti reuti at staff.uni-marburg.de
Tue Oct 3 20:20:56 BST 2006


Hi,

Am 03.10.2006 um 21:06 schrieb Liang Ge:

> Here is my script:
> ---------------------------------------------------------
> #!/bin/bash
>
> #$ -S /bin/bash
> #$ -pe mpich 8
> #$ -o temp
>
> cd $SGE_O_WORKDIR
> export MPICH_PROCESS_GROUP=no
>
> /opt/mpich-mx.gcc/bin/mpirun -machinefile $TMPDIR/machines -np $NSLOTS
> $SGE_O_WORKDIR/testt
> ----------------------------------------------------
>
> I can successfully submit the job with qsub. But when I try to remove
> it with qdel, only one process is killed and the rest 7 processes are
> continuously running.
>
> I tried solution #2 and #3 as described in the web page by Reuti (I
> couldn't follow #1), namely change the rsh_wrapper and recompile the
> mpich from patched source code. Still I got the same results as
> before: qdel only kill one process.
>

we have no Myrinet, but I heard that the #1 is no longer necessary,  
as the scripts provided by Myrinet changed. One point to look at, is  
the call to the slave processes. Are they done by rsh or ssh? This  
might be in one of the follow up scripts called by your mpirun command.

Can you check a running program with "ps -e f" to have a look at the  
process tree - are all bound to sge_shepherd on the slaves?

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list