Opened 20 years ago

Last modified 11 years ago

#6 new enhancement

IZ75: Need additional means to request resources separately for master task and for slave tasks

Reported by: andy Owned by:
Priority: normal Milestone:
Component: sge Version: current
Severity: Keywords: scheduling


[Imported from gridengine issuezilla]

        Issue #:      75               Platform:     All           Reporter: andy (andy)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      current          CC:
                                                                             [_] agrajag
                                                                             [_] gp26
                                                                             [_] reuti
                                                                             [_] Remove selected CCs
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     andreas
       * Summary:     Need additional means to request resources separately for master task and for slave tasks
   Status whiteboard:

     Issue 75 blocks:
   Votes for issue 75:  6

   Opened: Mon Oct 15 08:46:00 -0700 2001 

> For parallel jobs resource requests for consumables are always multiplied
> with the number of slots which are granted for a job - this makes it
> very difficult (for fixed slot requests) or impossible (for ranges) to
> manage per job consumables.
> I think the consumable definition should be extended to define resource
> requests per job.

Enhancing the consumable definition appears not to be the best answer
for this problem. It should be adressed instead by refining the capabilities
of our job-request syntax, to allow for requests of different type refering
to the same resource.

   ------- Additional comments from andreas Mon Jun 3 02:28:09 -0700 2002 -------
One proposal how to fix this is to introduce one or two new
switches for submission: -master and -all.

Like the -soft/-hard switch pair the new -master/-all will be used as
switches for switching between 'all' and 'master' regions. All
resource requests specified in a 'master' region
(i.e. -l and -q) are seen as requests valid only for the master
task. Also resource debitation of -master -l requests is done
only for the master task: for queue based resources with the
master queue, for host based resources at the master host and
for global resources at global scope. Resource requests in a -all
region are handled as before. It is possible to submit jobs with
both -all and -master resource requests.

The enhancement will allow specifying:
(1) wishes for all tasks of a parallel application (-soft -all -l)
(2) requests for all tasks of a parallel application (-hard -all -l)
(3) wishes for the master tasks (-soft -master -l)
(4) requests for the master tasks (-hard -master -l)

It might turn out that (3) is not really needed. If so we could
omit the -all switch and make it a switch triple -soft/-hard/-master.

   ------- Additional comments from andreas Tue Nov 26 03:36:08 -0700 2002 -------
There are two different approaches in discussion 'job consumables' and
'enhancing the job request syntax':

The weakness of the job consumables idea is that it requires
the administrator to decide for a each consumable resource
whether it should be treated as either a per job or a per slot
resource. For certain resources there is also a need to allow
them be requested both on a per job basis and a per slot basis. If
the administrator is constrained to decide on this question
apriori for all jobs within the cluster it will not be possible
to sufficiently handle jobs of both types within the same Grid
Engine installation
* jobs that require a constant amount of this very resource
  independently of the number of slots they get finally conceded
  by the scheduler
* jobs that require a varying amount of this very resource
  depending on the number they get finally conceded by the
the resources for which this deficiency has a very real meaning are
quite basic ones such as 'h_vmem' or 'h_fsize'.

The idea with enhancing the job-request syntax instead is to allow
for both of these types. The solution scetched so far in this issue
be extended to allow for expressing three variations of resource
* those that count for the whole job
* those that count for each host where a distributed memory
  parallel job
* those that count for each task of the job
the sites for which this flexibility will have a meaning are
HPC sites that run different types of parallel jobs such as MPI,
OpenMP and hybrid MPI/OpenMP jobs. Enhancing the job-request syntax
will have the advantage, that the very same consumable resource
can be requested in various fashions.

   ------- Additional comments from agrajag Wed Oct 5 09:53:16 -0700 2005 -------
This issue goes beyond consumables (although they are an important part of it).

Sometimes a master task needs rare resources that the other tasks don't.  For
instance, my 300+ node cluster only has 12 machines with 8GB of RAM.  Some of
our users need to have their master task on those machines, but want to have a
job with more than 24 tasks.

When I think about any reason a user might want to restrict where the master
task is run, it seems better to indicate that resource with a complex rather
than using -masterq and a queue name.

See this thread for more info:

   ------- Additional comments from reuti Wed Oct 5 13:02:22 -0700 2005 -------
The statement about hybrid MPI/OpenMP jobs is like the one I issued later:

(Obviously I didn't read all past issues in full :-( )

   ------- Additional comments from andreas Thu Oct 6 02:19:36 -0700 2005 -------
I agree on the usefullness of

-masterl resource_list      (requests applying to master task only)
-slavel  resource_list      (requests applying to all slave tasks)

as desired by users.

   ------- Additional comments from andreas Thu Nov 17 07:04:41 -0700 2005 -------
To overcome the problem of  parallel jobs one can divide the license
request by the number of tasks. So the jobs would be submitted like this

   qsub -pe fluent 2 -l fluent_paralell=0.5
   qsub -pe fluent 4 -l fluent_paralell=0.25
   qsub -pe fluent 8 -l fluent_paralell=0.125

and the complex must be a globally configured via qconf -me global.
Though this is still not a mater task request but it works.

   ------- Additional comments from andreas Fri Nov 18 03:12:09 -0700 2005 -------
See issue #24 for an alternative approach how to (partially) adress this.

   ------- Additional comments from pollinger Fri Dec 9 08:51:27 -0700 2005 -------
changed subcomponent

   ------- Additional comments from ernst Mon Dec 12 06:58:03 -0700 2005 -------
Changed subcomponent

   ------- Additional comments from gp26 Thu Jun 29 08:27:36 -0700 2006 -------

I am very interested in the -master proposal ! Let me explain the situation:
we use a software that needs specific machines for "slaves", and specific
machines for "masters".

With 6.0u8, it's *impossible* to run a PE job like this:
qsub -pe my_pe 10 -q q1@@slaves -masterq q1@@masters <command>
the result is a job pending forever, with qstat -r always complaining:
"cannot run in PE "my_pe" because it only offers 100 slots" whereas I
requested only 10 slots...

One way of making it works is to add the masterq group of host (@masters)
in the -q option. This is not what I want to do, because there is a risk to
allocate another "master" host as a slave, and I need to keep the masters free
for other PE jobs. One solution is to use a soft ressources, to use slaves
preferably, but still it *can* use a master instead when it lacks ressources.

In that respect, the -master option would give the freedom to specify a
hard-request, only to the @masters group specified in -masterq.

   ------- Additional comments from andreas Thu Jun 12 08:01:18 -0700 2008 -------
@gp26: I think what you request here was implemented as issue #2378 and issue
#2603 in 6.2. A summary on both can be found under

I'm leaving this issue here open as it still requests better means to control
slave/master task resource requests, as scetched with the -master/-slave options
that would facilitate different -l rsrc=val being requested for master and slave
tasks of a parallel job.

Change History (0)

Note: See TracTickets for help on using tickets.