Opened 10 years ago

Last modified 8 years ago

#609 new enhancement

IZ2819: PE int allocation rule should work with non-multiples

Reported by: templedf Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2
Severity: minor Keywords: scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2819]

        Issue #:      2819             Platform:     All           Reporter: templedf (templedf)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2              CC:    None defined
        Status:       NEW              Priority:     P1
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     PE int allocation rule should work with non-multiples
   Status whiteboard:
      Attachments:

     Issue 2819 blocks:
   Votes for issue 2819:


   Opened: Thu Dec 4 12:20:00 -0700 2008 
------------------------


Let's look at an example of the current behavior:

# qconf -sp 3per
pe_name           3per
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   3
control_slaves    FALSE
job_is_first_task TRUE
urgency_slots     min
# qconf -sq all.q | grep pe_list
pe_list               make 3per
# qsub -pe 3per 2 examples/jobs/sleeper.sh
Your job 21408 ("Sleeper") has been submitted
# qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@ultra20                  BIP   0/4       0.12     sol-amd64
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
 21408 0.55500 Sleeper    root         qw    12/01/2008 10:32:02     2
-bash-3.2# qstat -j 21408
==============================================================
job_number:                 21408
exec_file:                  job_scripts/21408
submission_time:            Mon Dec  1 10:32:02 2008
owner:                      root
uid:                        0
group:                      root
gid:                        0
sge_o_home:                 /root
sge_o_log_name:             root
sge_o_path:                 /usr/local/sge6.2/bin/sol-amd64:/usr/sbin:/usr/bin
sge_o_shell:                /usr/bin/bash
sge_o_tz:                   US/Pacific
sge_o_workdir:              /root
sge_o_host:                 ultra20
account:                    sge
mail_list:                  root@ultra20
notify:                     FALSE
job_name:                   Sleeper
jobshare:                   0
shell_list:                 /bin/sh
env_list:                  script_file:
/usr/local/sge6.2/examples/jobs/sleeper.sh
parallel environment:  3per range: 2
scheduling info:            cannot run in PE "3per" because it only offers 0 slots

I have a PE with an int allocation rule.  Unless I submit jobs that request a
multiple of that int, the jobs won't be scheduled.  Instead, it should be
possible to submit a job with, for example, -pe 3per 4, and get three slaves on
one host and 1 slave on another.

This is a P1 because a very important customer wants it.  Now.  Please.  Thank
you.  See the CR for details.

   ------- Additional comments from reuti Thu Dec 4 12:30:52 -0700 2008 -------
Why not using $fill_up in this case? Otherwise you could also say to get 4 slots on four different hosts,
which makes the allocation rule useless.

Or: you want "int" interpreted as a maximum per node, or per job? This could be done in an RQS I think.

That it's not refused right now, could be checked in the coming qsub submit-filter.

   ------- Additional comments from templedf Thu Dec 4 12:52:35 -0700 2008 -------
If I use $fill_up with 4-core machines, then "qsub -pe fillup 4" gets me 4 on
one machines.  I want 3 on one machine and 1 on another.

The solution you're suggesting with RQS is interesting, but doesn't quite solve
the problem.  I want there to be exactly <n> slaves per host, except for the
last host which will get <t % n> hosts.  The fillup/RQS solution will give me no
more than <n> per host, but could give me less.

   ------- Additional comments from reuti Thu Dec 4 13:06:38 -0700 2008 -------
I see, but you still want to use the 4th core on the 1st machine for other jobs? Otherwise the queue could
have just 3 slots which would give you the 3+1 when the 1st node is empty at the beginning.

What shall happen on the second node from which you want one slot - use the remaining slots for other
jobs or shall this node be blocked for further usage?

   ------- Additional comments from templedf Thu Dec 4 13:25:36 -0700 2008 -------
Limiting the slots by queue won't work because it must be possible to run 2per,
3per, 4per, etc. all on the same queue.  The slots in the queue that aren't used
by the PE job should be available for other (PE or non-PE) jobs to use.

   ------- Additional comments from reuti Thu Dec 4 15:03:12 -0700 2008 -------
Then I would suggest to give this allocation rule a new syntax.

For now I used the fixed allocation rule to direct jobs divisible by 2 to one part of the cluster (hence
allocation rule 2), and odd number of slots to the other part. SGE was able to discover that a request for 7
can't be satisfied by allocation rule 2. With the requested behavior, this would mean that I could get
2+2+2+1 (yes, in my setup even slot count jobs were allowed to get 1+1+1+1 - but on all nodes the
same number of slots was the necessity).

Change History (1)

comment:1 Changed 8 years ago by dlove

  • Priority changed from highest to normal
  • Severity set to minor
Note: See TracTickets for help on using tickets.