Opened 15 years ago

Last modified 8 years ago

#107 new enhancement

IZ624: Overflow not handled if large time interval parameters are used

Reported by: nmm Owned by:
Priority: low Milestone:
Component: sge Version: 5.3p4
Severity: Keywords: cleanup
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=624]

        Issue #:      624              Platform:     All           Reporter: nmm (nmm)
       Component:     gridengine          OS:        All
     Subcomponent:    cleanup          Version:      5.3p4            CC:    None defined
        Status:       REOPENED         Priority:     P4
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     ernst
          URL:
       * Summary:     Overflow not handled if large time interval parameters are used
   Status whiteboard:
      Attachments:

     Issue 624 blocks:
   Votes for issue 624:


   Opened: Fri Nov 14 09:38:00 -0700 2003 
------------------------


In sge_parse_num_par.c, there are many places
where the parsing of TYPE_DOUBLE values uses a
u_long32 intermediary (e.g. sge_parse_num_val on
line 724).  This does not work when supporting
64-bit hosts, and will cause workload balancing
(and perhaps other features) to fail on them.
This can be seen by compiling Gridengine with
-fptrap=common under Solaris 9, whereupon starting
sge_qmaster will abort with a SIGFPE if the
history includes jobs that used more than 32-bit
memory values.

   ------- Additional comments from andreas Fri Nov 14 10:04:28 -0700 2003 -------
When setting jobs resource limits only the output parameter

  sge_rlim_t *rlimp

of sge_parse_num_val() is used by sge_shepherd. So this should be
water-proof b/c sge_rlim_t maps into 64-bit aware rlim_t if
compiled on a Solaris 9 host.

Two questions:
(1) You seem to be using you own Grid Engine 5.3p4 binaries
    rather than gridengine.sunsource.net solaris64 5.3p4 binaries.
    Is there a special reason and can you say why you're
    using -fptrap=common compile option?
(2) Can you better/more detailed describe how the SIGFPE can be
    reproduced? I.e. what commands do you use?

It might be a good idea to use gridengine.sunsource.net dev@ or users@
mailing lists to clarify those questions.

   ------- Additional comments from andreas Mon Nov 17 07:24:19 -0700 2003 -------
In a gridengine.sunsource.net-binaries based solaris64 5.3p4
system no SIGFPE will happen since -fptrap=common is currently
not used to build those binaries.

However there is a set of Grid Enigne interfaces where values
*larger* than 4294967295 are treated as if they were 4294967295.
For these interfaces a better definition of the value range
is needed as well as related checkes regarding the value range:

queue_conf(5)
  min_cpu_interval
  s_rt
  h_rt

sge_conf(5)
  reschedule_unknown
  load_report_time
  suspend_interval
  stat_log_time

sched_conf(5)
  schedule_interval
  load_adjustment_decay_time
  sgeee_schedule_interval

qsub(1)
  -c <interval>

   ------- Additional comments from andreas Thu Mar 4 04:08:41 -0700 2004 -------
Will not be fixed with 6.0 beta.

   ------- Additional comments from andreas Mon Mar 29 04:58:45 -0700 2004 -------
Reopened.

   ------- Additional comments from rhierlmeier Mon Jul 26 02:34:11 -0700 2004 -------
Compile option -fptrap=common is currently not supported by SGE. The
fix of this issue requires a complete rewrite of the parameter parsing.
This could not be done in a patch.

   ------- Additional comments from sgrell Tue Dec 6 07:56:52 -0700 2005 -------
Changed subcomponent.

Stephan

   ------- Additional comments from ernst Mon Dec 12 06:42:10 -0700 2005 -------
Changed subcomponent and issue type.

Change History (0)

Note: See TracTickets for help on using tickets.