[GE users] qdel dows not kill the job when using pag command?

Reuti reuti at staff.uni-marburg.de
Wed May 30 10:35:58 BST 2007


Am 30.05.2007 um 10:27 schrieb Duc Bao Ta:

> Hi,
>
> I have read the postings about how qdel kills a job, but what process
> does it kill, i.e. which process group does it kill?
> My problem is that qdel does not delete the job, but the jobs  
> remains in
> "dr" state. When I look at the process tree I can see the following (I
> hope it is readable):
>
> USER PPID   PID  PGID   SID TPGID STAT   UID  COMMAND
> root     1  3267  3267  2480    -1 S        0
> /opt/sge/bin/lx24-x86/sge_execd
> root  3267  3273  3267  2480    -1 S        0   \_ /bin/sh
> /opt/sge/util/loadsensor.sh
> root  3267 19737  3267  2480    -1 S        0   \_ /bin/sh
> /opt/sge/util/pag -c exec /opt/sge/bin/lx24-x86/sge_shepherd -bg
> root 19737 19741 19741  2480    -1 S        0   |   \_
> /opt/sge/bin/lx24-x86/sge_shepherd -bg
> root 19741 19754 19741  2480    -1 S        0   |       \_
> /opt/sge/bin/lx24-x86/sge_coshepherd /opt/sge/util/set_token_cmd  
> duc 86400
> duc 19741 19981 19981 19981    -1 SNs   1025   |       \_ /bin/bash
> /opt/sge/sunfire/spool/silab03/job_scripts/281
> duc 19981 19983 19981 19981    -1 SN    1025   |           \_ sleep  
> 2222222
> root 3267 21297  3267  2480    -1 S        0   \_ /bin/sh
> /opt/sge/util/pag -c exec /opt/sge/bin/lx24-x86/sge_shepherd -bg
> root 21297 21301 21301  2480    -1 S        0       \_
> /opt/sge/bin/lx24-x86/sge_shepherd -bg
> root 21301 21314 21301  2480    -1 S        0           \_
> /opt/sge/bin/lx24-x86/sge_coshepherd /opt/sge/util/set_token_cmd  
> duc 86400
> duc 21301 21698 21698 21698    -1 SNs   1025           \_ /bin/bash
> /opt/sge/sunfire/spool/silab03/job_scripts/294
> duc 21698 21699 21698 21698    -1 SN    1025               \_ sleep  
> 2222222
>
> There are two jobs, still running after a forced deletion as a manager

For the first job should be killed with -19981, hence the bash and  
the sleep. Can you check in the messages file of SGE in the spool  
directory for this node, whether it was issued (maybe loglevel has to  
be set to "loglevel log_info" in the SGE configuration).

Did you also try first a qdel without -f?

-- Reuti


> user. I am using the set_token_cmd and pag_cmd options to get my
> kerberos tickets and afs tokens, so I rely on this job execution  
> scheme.
> Basically
>
>
> When I kill manually (SIGTERM and SIGKILL) as root the "job_scripts"
> processes, then the jobs terminates as desired (i.e. epilog script is
> executed), when I try to kill the set_token_cmd nothing happens,  
> when I
> kill the "sge_shepard" -bg process the jobs terminates directly  
> without
> calling the epilog script.
> Will the terminate method of the queue help here? Or should I  
> modify the
> set_token_cmd and pag_cmd scripts?
>
>
> Cheers
> Duc
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list