[GE users] Mvapich processes not killed on qdel

Reuti reuti at staff.uni-marburg.de
Wed May 9 17:27:05 BST 2007


Hi,

can you please post the processtree (master and slave) of a running  
job on a node by using the ps command:

ps -e f -o pid,ppid,pgrp,command

Are you sure, that the SGE rsh-wrapper is used, as you mentioned  
mpirun_ssh?

-- Reuti


Am 09.05.2007 um 17:43 schrieb Mike Hanby:

> Howdy,
>
> I have GE 6.0u8 on a Rocks 4.2.1 cluster with Infiniband and the  
> Topspin roll (which includes mvapich).
>
>
>
> When I qdel an mvapich job, the job immediately is removed from the  
> queue, however most of the processes on the nodes do not get  
> killed. It appears that the mpirun_ssh process does get killed,  
> however all of the actual job executables (sander.MPI) doesn't.
>
>
>
> I followed the directions for tight integration of Mvapich
>
> http://gridengine.sunsource.net/project/gridengine/howto/mvapich/ 
> MVAPICH_Integration.html
>
>
>
> The job runs fine, but again it doesn't kill off processes when  
> qdel'd.
>
>
>
> Here's the pe:
>
> $ qconf -sp mvapich
>
> pe_name           mvapich
>
> slots             9999
>
> user_lists        NONE
>
> xuser_lists       NONE
>
> start_proc_args   /share/apps/gridengine/mvapich/startmpi.sh - 
> catch_rsh \
>
>                   $pe_hostfile
>
> stop_proc_args    /share/apps/gridengine/mvapich/stopmpi.sh
>
> allocation_rule   $round_robin
>
> control_slaves    TRUE
>
> job_is_first_task FALSE
>
> urgency_slots     min
>
>
>
> The only modifications made to the startmpi.sh script was to change  
> the location of the hostname and rsh scripts from $SGE_ROOT to / 
> share/apps/gridengine/mvapich
>
>
>
> Any suggestions on what I should look for?
>
>
>
> Thanks, MIke
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list