Ticket #197 (new enhancement)

Opened 10 years ago

Last modified 3 years ago

IZ1254: Entry in PE to change multiplication of resource limits

Reported by: reuti Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0
Severity: Keywords: Linux scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254]

        Issue #:      1254             Platform:     Other         Reporter: reuti (reuti)
       Component:     gridengine          OS:        Linux
     Subcomponent:    scheduling       Version:      6.0              CC:
                                                                             [_] uddeborg
                                                                             [_] Remove selected CCs
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     Entry in PE to change multiplication of resource limits
   Status whiteboard:
      Attachments:

     Issue 1254 blocks:
   Votes for issue 1254:


   Opened: Fri Aug 27 02:10:00 -0700 2004 
------------------------


In the current implementation, the RESOURCE LIMITS
to a queue/job is multiplied by the number of slots taken
on the master machine (and it seems not for qrsh
processes).

There should be two switches in the configuration of a
PE:

multiply_limits_for_master_process
multiply_limits_for_slave_processes

On the one hand, you have to multiply the resource
limits to get the correct limits for the master processes
(also I found that there is a time delay, until the resource
consumption of all child processes are accounted by
the mother process in contrast to the immediate
enforcement of the limits for a single process). On the
other hand the multiplication maybe wrong for
processes creating child processes by (q)rsh.

E.g. Gaussian03 with Linda. You can request 8
processes on 4 machines and decide in the Gaussian
inputfile to make 7 times (q)rsh, or to make only 3 times
(q)rsh and create the other tasks as threads. You have
to decide this from job to job, because some calculation
types are only Linda parallel, others are only thread
parallel.

With the availability of the switches for the PEs, I would
just create two PEs and would get the correct limits for
each job type.

   ------- Additional comments from reuti Fri Aug 27 05:21:45 -0700 2004 -------
To limit the amount of (q)rsh commands allowed by SGE, maybe it would
be better to have an entry:

limit_to_one_qrsh_per_host yes/no

instead of the suggested:

multiply_limits_for_slave_processes yes/no.

The latter should also be applied, but can be derived from the first one.

   ------- Additional comments from sgrell Mon Dec 12 02:44:19 -0700 2005 -------
Changed subcomponent.

Stephan

   ------- Additional comments from reuti Thu Aug 24 12:33:46 -0700 2006 -------
A similar feature would be to allow or disallow the multiplication of resource requests:

multiply_resource_requests

   ------- Additional comments from uddeborg Thu May 31 09:17:55 -0700 2007 -------
I find it a bit silly to have to add a comment just to add yourself to the CC
list. :-)

   ------- Additional comments from reuti Fri Apr 25 05:13:18 -0700 2008 -------
Over time I think now it's better to specify it in addition on the complex level with an additonal column
"multiply yes/now". Reason is, that e.g. for a memory limit it might be necessary to have it per slot while
at the same time the license is per job. OTOH, different parallel jobs might need or need not the
multiplication of a memory limit in the same cluster (OpenMP jobs work on the same memory area, while
MPI ones don't do it). Hence the entry in the PE would still be advantageous.

   ------- Additional comments from reuti Fri Apr 25 05:25:38 -0700 2008 -------
Or an entry in the PE listing the not to be multiplied complexes.

   ------- Additional comments from roland Fri Jan 30 06:02:22 -0700 2009 -------
Reuti's note from Apr 25 2008 will be implemented in 6.2u2 by the non-multiplied
resource requests. For more informations please see:
http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/devel/rfe/non-multiplied-pe-requests.txt

   ------- Additional comments from reuti Fri Jan 30 06:46:04 -0700 2009 -------
It is nice of course to get this feature as it will solve the odd handling of licenses where only one is needed
per job, but it will not be enough to cover a mix of jobs in the cluster. Therefore I wrote, that an entry in
the PE would still be advantageous: h_vmem can only be JOBS or YES

See also: http://gridengine.sunsource.net/ds/viewMessage.do?dsMessageId=98580&dsForumId=38

Shall I enter a new issue for this?

   ------- Additional comments from roland Fri Jan 30 06:55:59 -0700 2009 -------
There is no need to create a new issue because I did not change the state of
this one.
Note: See TracTickets for help on using tickets.