[GE users] Yet another qdel mpich problem (SGE 6.0u1)

Vladimir Florinski vflorins at ucr.edu
Wed Sep 8 19:52:54 BST 2004


On Wed, 2004-09-08 at 07:53, Sean Dilda wrote:
> On Tue, 2004-09-07 at 19:54, Vladimir Florinski wrote:
> > It appears the problem with the qdel command (inability to terminate the
> > children processes) continues to haunt MPI users. 
> 
> I do no have this problem on my cluster.  I used to have it, but fixed
> it by hacking mpich to not create its own process group by default. 
> There's also a similar fix you can do in the submit script.
> 
> > #$ -N inst-nn-6
> > #$ -cwd
> > #$ -pe mpi 2-10
> > #$ -v MPIR_HOME
> > /opt/mpich-gm/bin/mpirun.ch_gm --gm-no-shmem -machinefile
> > $TMPDIR/machines --gm-kill 15 -np $NSLOTS ./mpi_main -new 100.0
> 
> Is this script in tcsh?  If so, try adding the line:
> setenv MPICH_PROCESS_GROUP no
> 
> before your call to mpirun.
> 
> One note, I don't use the myrinet extensions, so I'm not sure if that
> will change things or not.

I forgot to mention that I tried this suggestion already using

export MPICH_PROCESS_GROUP=no

(my startup is a bash script), but it didn't fix the qdel problem. Is
this supposed to work with MPICH 1.2.5? Perhaps the version of MPICH
distributed by Myricom does not have this enabled...

-- 
Vladimir Florinski
Assistant Research Physicist
Institute of Geophysics and Planetary Physics
University of California
Riverside, CA 92521
phone: 1-909-787-3943
fax: 1-909-787-4509


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list