[GE users] Yet another qdel mpich problem (SGE 6.0u1)

Sean Dilda agrajag at dragaera.net
Wed Sep 8 15:53:22 BST 2004


On Tue, 2004-09-07 at 19:54, Vladimir Florinski wrote:
> It appears the problem with the qdel command (inability to terminate the
> children processes) continues to haunt MPI users. 

I do no have this problem on my cluster.  I used to have it, but fixed
it by hacking mpich to not create its own process group by default. 
There's also a similar fix you can do in the submit script.

> #$ -N inst-nn-6
> #$ -cwd
> #$ -pe mpi 2-10
> #$ -v MPIR_HOME
> /opt/mpich-gm/bin/mpirun.ch_gm --gm-no-shmem -machinefile
> $TMPDIR/machines --gm-kill 15 -np $NSLOTS ./mpi_main -new 100.0

Is this script in tcsh?  If so, try adding the line:
setenv MPICH_PROCESS_GROUP no

before your call to mpirun.

One note, I don't use the myrinet extensions, so I'm not sure if that
will change things or not.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list