Opened 3 months ago

#1596 new defect

SGE checkpoint not suspending job - Application Level

Reported by: senthilcaesar Owned by:
Priority: normal Milestone:
Component: sge Version: 8.1.9
Severity: major Keywords: Job suspension not working
Cc:

Description

Hello,

I submit a serial job using the below environment . When I try to suspend the job, qstat -f display that the job is suspended (s) , but when I check the job on the cluster node the job is still running . I tried modifying the queue suspend method to "suspend_method SIGSTOP" and ran the job again, but still no luck . Upon suspension migr_command command is working, but the job itself is not getting suspended. Seems like the SGE is not sending the SIGSTOP signal to the running job . Could someone have an insight of what could be causing this issue.

-bash-4.2$ qconf -sckpt check_application
ckpt_name check_application
interface application-level
ckpt_command /tmp/CRIU/checkpoint.sh $job_pid
migr_command /tmp/CRIU/migrate.sh
restart_command /tmp/CRIU/restore.sh
clean_command none
ckpt_dir /tmp
signal usr2
when xsmr

-bash-4.2$ qconf -sq mine.q
qname mine.q
hostlist @allhosts
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list check_application
pe_list make smp mpi
rerun FALSE
slots 1
tmpdir /tmp
shell /bin/sh
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method SIGSTOP
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY

Change History (0)

Note: See TracTickets for help on using tickets.