No subject


Wed Jan 12 20:38:46 GMT 2011


> killkids.sh:
> 
> #!/bin/sh
> #
> # Argument $1 must be the process id (i.e. $job_pid)
> #
> sleep 5
> /home/reuti/getkids.sh $1 | xargs kill -9
> exit 0
> 
> 
> getkids.sh (like before):
> 
> #!/bin/sh
> #
> # Argument $1 must be the process id (i.e. $job_pid)
> #
> group=`awk '/^Groups/ { for (i=2;i<=NF;i++) if ($i>=20000 && $i<=21000) { 
> print $i }}' /proc/$1/status`
> for process in /proc/[0-9]*; do awk '/^Groups/ { for (i=2;i<=NF;i++) if 
> ($i==group) { print process }}' process=${process##*/} group=$group 
> $process/status; done
> 
> An entry in the queue (for testing purpose):
> 
> reuti at pc15370:~> qconf -sq all.q
> ...
> terminate_method      /home/reuti/killkids.sh $job_pid
> 
> 
> The "sleep 5" I added to avoid the master will be killed first and the 
> process on a slave vansihes to fast as the mpd is gone - resulting again in 
> the behavior that SGE won't do anything on a slave node.


Anyway, the Hydra startup in the forthcoming MPICH2 implementation 1.3 it will be much easier to handle.

NB: interesting from the history: LAM/MPI moved with Open MPI to an implementation where the daemons are not running all the time, and MPICH(1) moved with MPICH(2) to a new default with daemons, and now with Hydra back.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=280046

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list