[GE users] qdel dows not kill the job when using pag command?

Duc Bao Ta ta at physik.uni-bonn.de
Thu May 31 13:54:55 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti,

I always try to delete the jobs with qdel and if after that nothing
happens I try qdel -f.
In the logs with log_level info I only get these messages on the exec node:

05/31/2007 14:51:26|execd|silab03|I|SIGNAL jid: 472 jatask: 1 signal: KILL
05/31/2007 14:51:33|execd|silab03|I|SIGNAL jid: 472 jatask: 1 signal: KILL

now with qdel -f:
05/31/2007 14:52:13|execd|silab03|I|SIGNAL jid: 472 jatask: 1 signal: KILL
05/31/2007 14:52:53|execd|silab03|I|SIGNAL jid: 472 jatask: 1 signal: KILL
05/31/2007 14:53:14|execd|silab03|I|SIGNAL jid: 472 jatask: 1 signal: KILL

I cannot tell if the SIGKILL is applied to the correct pid.

Cheers
Duc

Reuti schrieb:
> Am 30.05.2007 um 10:27 schrieb Duc Bao Ta:
>
>> Hi,
>>
>> I have read the postings about how qdel kills a job, but what process
>> does it kill, i.e. which process group does it kill?
>> My problem is that qdel does not delete the job, but the jobs remains in
>> "dr" state. When I look at the process tree I can see the following (I
>> hope it is readable):
>>
>> USER PPID   PID  PGID   SID TPGID STAT   UID  COMMAND
>> root     1  3267  3267  2480    -1 S        0
>> /opt/sge/bin/lx24-x86/sge_execd
>> root  3267  3273  3267  2480    -1 S        0   \_ /bin/sh
>> /opt/sge/util/loadsensor.sh
>> root  3267 19737  3267  2480    -1 S        0   \_ /bin/sh
>> /opt/sge/util/pag -c exec /opt/sge/bin/lx24-x86/sge_shepherd -bg
>> root 19737 19741 19741  2480    -1 S        0   |   \_
>> /opt/sge/bin/lx24-x86/sge_shepherd -bg
>> root 19741 19754 19741  2480    -1 S        0   |       \_
>> /opt/sge/bin/lx24-x86/sge_coshepherd /opt/sge/util/set_token_cmd duc
>> 86400
>> duc 19741 19981 19981 19981    -1 SNs   1025   |       \_ /bin/bash
>> /opt/sge/sunfire/spool/silab03/job_scripts/281
>> duc 19981 19983 19981 19981    -1 SN    1025   |           \_ sleep
>> 2222222
>> root 3267 21297  3267  2480    -1 S        0   \_ /bin/sh
>> /opt/sge/util/pag -c exec /opt/sge/bin/lx24-x86/sge_shepherd -bg
>> root 21297 21301 21301  2480    -1 S        0       \_
>> /opt/sge/bin/lx24-x86/sge_shepherd -bg
>> root 21301 21314 21301  2480    -1 S        0           \_
>> /opt/sge/bin/lx24-x86/sge_coshepherd /opt/sge/util/set_token_cmd duc
>> 86400
>> duc 21301 21698 21698 21698    -1 SNs   1025           \_ /bin/bash
>> /opt/sge/sunfire/spool/silab03/job_scripts/294
>> duc 21698 21699 21698 21698    -1 SN    1025               \_ sleep
>> 2222222
>>
>> There are two jobs, still running after a forced deletion as a manager
>
> For the first job should be killed with -19981, hence the bash and the
> sleep. Can you check in the messages file of SGE in the spool
> directory for this node, whether it was issued (maybe loglevel has to
> be set to "loglevel log_info" in the SGE configuration).
>
> Did you also try first a qdel without -f?
>
> -- Reuti
>
>
>> user. I am using the set_token_cmd and pag_cmd options to get my
>> kerberos tickets and afs tokens, so I rely on this job execution scheme.
>> Basically
>>
>>
>> When I kill manually (SIGTERM and SIGKILL) as root the "job_scripts"
>> processes, then the jobs terminates as desired (i.e. epilog script is
>> executed), when I try to kill the set_token_cmd nothing happens, when I
>> kill the "sge_shepard" -bg process the jobs terminates directly without
>> calling the epilog script.
>> Will the terminate method of the queue help here? Or should I modify the
>> set_token_cmd and pag_cmd scripts?
>>
>>
>> Cheers
>> Duc
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list