[GE users] Mvapich processes not killed on qdel

Mike Hanby mhanby at uab.edu
Wed May 9 16:43:34 BST 2007



I have GE 6.0u8 on a Rocks 4.2.1 cluster with Infiniband and the Topspin
roll (which includes mvapich).


When I qdel an mvapich job, the job immediately is removed from the
queue, however most of the processes on the nodes do not get killed. It
appears that the mpirun_ssh process does get killed, however all of the
actual job executables (sander.MPI) doesn't.


I followed the directions for tight integration of Mvapich



The job runs fine, but again it doesn't kill off processes when qdel'd.


Here's the pe:

$ qconf -sp mvapich

pe_name           mvapich

slots             9999

user_lists        NONE

xuser_lists       NONE

start_proc_args   /share/apps/gridengine/mvapich/startmpi.sh -catch_rsh


stop_proc_args    /share/apps/gridengine/mvapich/stopmpi.sh

allocation_rule   $round_robin

control_slaves    TRUE

job_is_first_task FALSE

urgency_slots     min


The only modifications made to the startmpi.sh script was to change the
location of the hostname and rsh scripts from $SGE_ROOT to


Any suggestions on what I should look for?


Thanks, MIke


More information about the gridengine-users mailing list