Opened 18 years ago

Last modified 10 years ago

#100 new enhancement

IZ584: qrsh jobs must support rescheduling upon resource shortage (exit 99 rescheduling)

Reported by: gstuckey Owned by:
Priority: high Milestone:
Component: sge Version: 5.3p3
Severity: minor Keywords: qmaster


[Imported from gridengine issuezilla]

        Issue #:      584              Platform:     All           Reporter: gstuckey (gstuckey)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      5.3p3            CC:    None defined
        Status:       NEW              Priority:     P2
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
       * Summary:     qrsh jobs must support rescheduling upon resource shortage (exit 99 rescheduling)
   Status whiteboard:

     Issue 584 blocks:
   Votes for issue 584:

   Opened: Wed Jul 23 06:09:00 -0700 2003 

Support rescheduling of qrsh jobs that exit with 99.

qrsh -now no ./test_resub.csh

===== begin test_resub.csh =====
#!/bin/csh -f
echo "hello"
sleep 1
exit 99
===== end test_resub.csh =====

It only runs once and exits.

If I use qsub, then the script does get rescheduled.

   ------- Additional comments from andreas Wed Sep 3 06:33:45 -0700 2003 -------
Rescheduling of qrsh jobs is not supported so far.

   ------- Additional comments from joga Mon Sep 20 07:01:11 -0700 2004 -------
*** Issue 1050 has been marked as a duplicate of this issue. ***

   ------- Additional comments from joga Tue Sep 21 04:02:33 -0700 2004 -------
The current behaviour (Grid Engine 6.0) is to output a message
indicating that the job exited with exit status 99 and exit 1.
Formatting of the output lacks a newline.

% qrsh exit 99
exit_status of job start = 99% echo $?

We should consider this behaviour being a bug.
qrsh should silently exit 99 if the started command/script exits with
exit code 99.

Implementing real rescheduling for qrsh might be done in a second step.

   ------- Additional comments from andreas Wed Nov 3 08:14:26 -0700 2004 -------
Changed summary to better reflect primary problem.

   ------- Additional comments from andreas Mon Nov 8 10:45:10 -0700 2004 -------
There might be a chance to cause qrsh jobs be rescheduled upon
exit of prolog. AFAIK there isn't a need for qrsh-side changes if
shepherd/execd/qmaster handle that case orderly.

   ------- Additional comments from joga Tue Nov 9 06:42:11 -0700 2004 -------
We also have to handle the issue in qrsh (qsh binary):
When the reschedule situation is detected, qrsh may not exit, but has
to repeat previous steps:
- wait for a shepherd connection
- spawn a rsh client to connect to the port bound by shepherd
- in verbose mode, it should give some information about the rescheduling.

   ------- Additional comments from andreas Thu Mar 31 09:27:27 -0700 2005 -------
Changed to P2 RFE.
Changed summary.

   ------- Additional comments from pollinger Fri Dec 9 08:23:57 -0700 2005 -------
Changed subcomponent

Change History (1)

comment:1 Changed 10 years ago by dlove

  • Severity set to minor

Current (8.0.0) behaviour of qrsh exit 99 is to exit silently with code 99.

Note: See TracTickets for help on using tickets.