[GE users] Mvapich processes not killed on qdel

Mike Hanby mhanby at uab.edu
Wed May 9 16:43:34 BST 2007


Howdy,

 

I have GE 6.0u8 on a Rocks 4.2.1 cluster with Infiniband and the Topspin
roll (which includes mvapich).

 

When I qdel an mvapich job, the job immediately is removed from the
queue, however most of the processes on the nodes do not get killed. It
appears that the mpirun_ssh process does get killed, however all of the
actual job executables (sander.MPI) doesn't.

 

I followed the directions for tight integration of Mvapich

http://gridengine.sunsource.net/project/gridengine/howto/mvapich/MVAPICH
_Integration.html

 

The job runs fine, but again it doesn't kill off processes when qdel'd.

 

Here's the pe:

$ qconf -sp mvapich

pe_name           mvapich

slots             9999

user_lists        NONE

xuser_lists       NONE

start_proc_args   /share/apps/gridengine/mvapich/startmpi.sh -catch_rsh
\

                  $pe_hostfile

stop_proc_args    /share/apps/gridengine/mvapich/stopmpi.sh

allocation_rule   $round_robin

control_slaves    TRUE

job_is_first_task FALSE

urgency_slots     min

 

The only modifications made to the startmpi.sh script was to change the
location of the hostname and rsh scripts from $SGE_ROOT to
/share/apps/gridengine/mvapich

 

Any suggestions on what I should look for?

 

Thanks, MIke

 




More information about the gridengine-users mailing list