[GE users] qdel not deleting all mpi slave tasks

Lengyel, Florian FLengyel at gc.cuny.edu
Thu Jun 10 22:15:53 BST 2004


I've tried the modification suggested:
> MPICH_PROCESS_GROUP=no
> export MPICH_PROCESS_GROUP

but a qdel leaves behind a suspended task and a zombie:

  5:14pm  up 63 days,  7:37,  6 users,  load average: 0.03, 0.18, 0.20 
108 processes: 104 sleeping, 1 running, 3 zombie, 0 stopped 
CPU0 states:  0.0% user,  0.3% system,  0.0% nice, 99.2% idle 
CPU1 states:  0.0% user,  0.4% system,  0.0% nice, 99.1% idle 
CPU2 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle 
CPU3 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle 
Mem:  1030676K av,  937772K used,   92904K free,       0K shrd,  160444K
buff 
Swap: 2096440K av,   55884K used, 2040556K free                  537660K
cached 
 
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND 
23201 kuklov     9   0     0    0     0 Z     0.0  0.0   0:26 exe_sf
<defunct> 
23253 kuklov     9   0   736  736   580 S     0.0  0.0   0:00 exe_sf 

So I would conclude that this doesn't work...

FL

-----Original Message-----
From: Lengyel, Florian [mailto:FLengyel at gc.cuny.edu] 
Sent: Thursday, June 10, 2004 8:29 AM
To: 'Ron Chen '; 'users at gridengine.sunsource.net '
Subject: RE: [GE users] qdel not deleting all mpi slave tasks

Well, I would think that they are, given that the job is a modification
of the sample mpi script--unless I'm missing something. Here's a typical
example:

#$ -S /bin/csh
#$ -N mpitest
#
# pe request
#$ -pe mpich 2-6
#
# MPIR_HOME from submitting environment
#$ -v MPIR_HOME=/usr/pgi/linux86,COMMD_PORT
# ---------------------------

#
# needs in
#   $NSLOTS
#       the number of tasks to be used
#   $TMPDIR/machines
#       a valid machiche file to be passed to mpirun

echo "Got $NSLOTS slots."
echo "tmpdir is $TMPDIR"

# enables $TMPDIR/rsh to catch rsh calls if available
set path=($TMPDIR $path)

$MPIR_HOME/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines
/usr/pgi/cdk/mpi
ch/examples/cpi

qsub mpi.sh yields:

[flengyel at monad flengyel]$ qstat -u flengyel
job-ID  prior name       user         state submit/start at     queue
master  ja-task-ID
----------------------------------------------------------------------------
-----------------
  19602     0 mpitest    flengyel     r     06/10/2004 08:24:55 idle03.q
MASTER
            0 mpitest    flengyel     r     06/10/2004 08:24:55 idle03.q
SLAVE
  19602     0 mpitest    flengyel     r     06/10/2004 08:24:55 idle04.q
SLAVE
  19602     0 mpitest    flengyel     r     06/10/2004 08:24:55 idle05.q
SLAVE
  19602     0 mpitest    flengyel     r     06/10/2004 08:24:55 idle06.q
SLAVE
  19602     0 mpitest    flengyel     r     06/10/2004 08:24:55 idle07.q
SLAVE
  19602     0 mpitest    flengyel     r     06/10/2004 08:24:55 idle08.q
SLAVE
[flengyel at monad flengyel]$

I'll check what the user is doing, but the idea was to use the
mpich.template, which as I now recall was used for its tight integration,
and  to slavishly follow the SGE mpi example scripts.


-----Original Message-----
From: Ron Chen
To: users at gridengine.sunsource.net
Sent: 6/10/2004 7:56 AM
Subject: RE: [GE users] qdel not deleting all mpi slave tasks

Also, please check if the MPI tasks are started by the
SGE daemons or not.

 -Ron

--- "Lengyel, Florian" <FLengyel at gc.cuny.edu> wrote:
> Yes, I am using the tight integration template: this
> is how it's modified
> for my setup:
> 
> pe_name          mpich
> queue_list       all
> slots            999
> user_lists       NONE
> xuser_lists      NONE
> start_proc_args  /usr/local/sge/mpi/startmpi.sh
> -catch_rsh $pe_hostfile
> stop_proc_args   /usr/local/sge/mpi/stopmpi.sh
> allocation_rule  $round_robin
> control_slaves   TRUE
> job_is_first_task FALSE
> 
> As far as I can tell, the mpich.template, which the
> README refers to as the
> template to use for tight integration, and from
> which this 
> derives, has
> 
> control_slaves   TRUE
> job_is_first_task FALSE
> 
> whereas the mpi.template reverses these:
> 
> control_slaves   FALSE
> job_is_first_task TRUE
> 
> 
> 
> -----Original Message-----
> From: Ron Chen
> To: users at gridengine.sunsource.net
> Sent: 6/10/2004 12:02 AM
> Subject: RE: [GE users] qdel not deleting all mpi
> slave tasks
> 
> See $SGE_ROOT/mpi/README.
> 
> Basically, tight integration allows SGE to
> start/control the slave MPI tasks. You should check
> if
> the slave MPI tasks are the children (or
> grandchildren, or grand-grandchildren, etc) of the
> SGE
> daemons.
> 
>  -Ron
> 
> 
> --- "Lengyel, Florian" <FLengyel at gc.cuny.edu> wrote:
> > Pardon my ignorance: what is tight integration?
> The
> > answer is probably no.
> > What I did was modify the sample pe for mpi so
> that
> > it would find mpirun;
> > there were no other changes, as I recall.
> > 
> 
> 
> 	
> 		
> __________________________________
> Do you Yahoo!?
> Friends.  Fun.  Try the all-new Yahoo! Messenger.
> http://messenger.yahoo.com/ 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 



	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list