[GE users] qdel dows not kill the job when using pag command?

Duc Bao Ta ta at physik.uni-bonn.de
Wed May 30 09:27:39 BST 2007


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

I have read the postings about how qdel kills a job, but what process
does it kill, i.e. which process group does it kill?
My problem is that qdel does not delete the job, but the jobs remains in
"dr" state. When I look at the process tree I can see the following (I
hope it is readable):

USER PPID   PID  PGID   SID TPGID STAT   UID  COMMAND
root     1  3267  3267  2480    -1 S        0 
/opt/sge/bin/lx24-x86/sge_execd
root  3267  3273  3267  2480    -1 S        0   \_ /bin/sh
/opt/sge/util/loadsensor.sh
root  3267 19737  3267  2480    -1 S        0   \_ /bin/sh
/opt/sge/util/pag -c exec /opt/sge/bin/lx24-x86/sge_shepherd -bg
root 19737 19741 19741  2480    -1 S        0   |   \_
/opt/sge/bin/lx24-x86/sge_shepherd -bg
root 19741 19754 19741  2480    -1 S        0   |       \_
/opt/sge/bin/lx24-x86/sge_coshepherd /opt/sge/util/set_token_cmd duc 86400
duc 19741 19981 19981 19981    -1 SNs   1025   |       \_ /bin/bash
/opt/sge/sunfire/spool/silab03/job_scripts/281
duc 19981 19983 19981 19981    -1 SN    1025   |           \_ sleep 2222222
root 3267 21297  3267  2480    -1 S        0   \_ /bin/sh
/opt/sge/util/pag -c exec /opt/sge/bin/lx24-x86/sge_shepherd -bg
root 21297 21301 21301  2480    -1 S        0       \_
/opt/sge/bin/lx24-x86/sge_shepherd -bg
root 21301 21314 21301  2480    -1 S        0           \_
/opt/sge/bin/lx24-x86/sge_coshepherd /opt/sge/util/set_token_cmd duc 86400
duc 21301 21698 21698 21698    -1 SNs   1025           \_ /bin/bash
/opt/sge/sunfire/spool/silab03/job_scripts/294
duc 21698 21699 21698 21698    -1 SN    1025               \_ sleep 2222222

There are two jobs, still running after a forced deletion as a manager
user. I am using the set_token_cmd and pag_cmd options to get my
kerberos tickets and afs tokens, so I rely on this job execution scheme.
Basically


When I kill manually (SIGTERM and SIGKILL) as root the "job_scripts"
processes, then the jobs terminates as desired (i.e. epilog script is
executed), when I try to kill the set_token_cmd nothing happens, when I
kill the "sge_shepard" -bg process the jobs terminates directly without
calling the epilog script.
Will the terminate method of the queue help here? Or should I modify the
set_token_cmd and pag_cmd scripts?


Cheers
Duc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list