[GE users] mpich2 qdel problems

ajw ajw at illinois.edu
Fri Jun 5 21:21:10 BST 2009


Hi,

I've got gridengine 6.2 set up with mpich2 mpd tight integration.
Running jobs seems to be ok, but killing the jobs doesn't work right.
The jobs seem to be created correctly.  It looks like this in the ps
output:

20584     1 20584 /site/local/pkg/sge/bin/lx24-amd64/sge_execd
28056 20584 28056  \_ sge_shepherd-7427 -bg
28057 28056 28057      \_ /site/local/pkg/sge/utilbin/lx24-amd64/qrsh_starter /scratch/local/sge/default/spool/hnode1/active_jobs/7427.1/1.hnode1
28067 28057 28067          \_ python2.4 /site/local/pkg/mpich2/Linux-x86_64-Boron/bin/mpd -h hnode2 -p 44971 -n
28074 28067 28074              \_ python2.4 /site/local/pkg/mpich2/Linux-x86_64-Boron/bin/mpd -h hnode2 -p 44971 -n
28076 28074 28076              |   \_ /home/ajw/mpich2/mpihello
28075 28067 28075              \_ python2.4 /site/local/pkg/mpich2/Linux-x86_64-Boron/bin/mpd -h hnode2 -p 44971 -n
28077 28075 28077                  \_ /home/ajw/mpich2/mpihello

Once I do a qdel the mpihello job doesn't die.  It looks like 
this:

20584     1 20584 /site/local/pkg/sge/bin/lx24-amd64/sge_execd
28076     1 28076 /home/ajw/mpich2/mpihello
28077     1 28077 /home/ajw/mpich2/mpihello

I've got ENABLE_ADDGRP_KILL=TRUE set, too.

On the master node the job seems to get killed ok.

Thanks for any help.

Andy

-- 
andy wettstein
unix administrator
department of physics
university of illinois at urbana-champaign

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201018

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list