Opened 18 years ago
Last modified 9 years ago
#100 new enhancement
IZ584: qrsh jobs must support rescheduling upon resource shortage (exit 99 rescheduling)
Reported by: | gstuckey | Owned by: | |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | sge | Version: | 5.3p3 |
Severity: | minor | Keywords: | qmaster |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=584]
Issue #: 584 Platform: All Reporter: gstuckey (gstuckey) Component: gridengine OS: All Subcomponent: qmaster Version: 5.3p3 CC: None defined Status: NEW Priority: P2 Resolution: Issue type: ENHANCEMENT Target milestone: --- Assigned to: ernst (ernst) QA Contact: ernst URL: * Summary: qrsh jobs must support rescheduling upon resource shortage (exit 99 rescheduling) Status whiteboard: Attachments: Issue 584 blocks: Votes for issue 584: Opened: Wed Jul 23 06:09:00 -0700 2003 ------------------------ Support rescheduling of qrsh jobs that exit with 99. qrsh -now no ./test_resub.csh ===== begin test_resub.csh ===== #!/bin/csh -f echo "hello" sleep 1 exit 99 ===== end test_resub.csh ===== It only runs once and exits. If I use qsub, then the script does get rescheduled. ------- Additional comments from andreas Wed Sep 3 06:33:45 -0700 2003 ------- Rescheduling of qrsh jobs is not supported so far. ------- Additional comments from joga Mon Sep 20 07:01:11 -0700 2004 ------- *** Issue 1050 has been marked as a duplicate of this issue. *** ------- Additional comments from joga Tue Sep 21 04:02:33 -0700 2004 ------- The current behaviour (Grid Engine 6.0) is to output a message indicating that the job exited with exit status 99 and exit 1. Formatting of the output lacks a newline. % qrsh exit 99 exit_status of job start = 99% echo $? 1 We should consider this behaviour being a bug. qrsh should silently exit 99 if the started command/script exits with exit code 99. Implementing real rescheduling for qrsh might be done in a second step. ------- Additional comments from andreas Wed Nov 3 08:14:26 -0700 2004 ------- Changed summary to better reflect primary problem. ------- Additional comments from andreas Mon Nov 8 10:45:10 -0700 2004 ------- There might be a chance to cause qrsh jobs be rescheduled upon exit of prolog. AFAIK there isn't a need for qrsh-side changes if shepherd/execd/qmaster handle that case orderly. ------- Additional comments from joga Tue Nov 9 06:42:11 -0700 2004 ------- We also have to handle the issue in qrsh (qsh binary): When the reschedule situation is detected, qrsh may not exit, but has to repeat previous steps: - wait for a shepherd connection - spawn a rsh client to connect to the port bound by shepherd (qlogin_starter) - in verbose mode, it should give some information about the rescheduling. ------- Additional comments from andreas Thu Mar 31 09:27:27 -0700 2005 ------- Changed to P2 RFE. Changed summary. ------- Additional comments from pollinger Fri Dec 9 08:23:57 -0700 2005 ------- Changed subcomponent
Note: See
TracTickets for help on using
tickets.
Current (8.0.0) behaviour of qrsh exit 99 is to exit silently with code 99.