Opened 12 years ago
Last modified 10 years ago
#609 new enhancement
IZ2819: PE int allocation rule should work with non-multiples
Reported by: | templedf | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.2 |
Severity: | minor | Keywords: | scheduling |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2819]
Issue #: 2819 Platform: All Reporter: templedf (templedf) Component: gridengine OS: All Subcomponent: scheduling Version: 6.2 CC: None defined Status: NEW Priority: P1 Resolution: Issue type: ENHANCEMENT Target milestone: --- Assigned to: andreas (andreas) QA Contact: andreas URL: * Summary: PE int allocation rule should work with non-multiples Status whiteboard: Attachments: Issue 2819 blocks: Votes for issue 2819: Opened: Thu Dec 4 12:20:00 -0700 2008 ------------------------ Let's look at an example of the current behavior: # qconf -sp 3per pe_name 3per slots 999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule 3 control_slaves FALSE job_is_first_task TRUE urgency_slots min # qconf -sq all.q | grep pe_list pe_list make 3per # qsub -pe 3per 2 examples/jobs/sleeper.sh Your job 21408 ("Sleeper") has been submitted # qstat -f queuename qtype used/tot. load_avg arch states ---------------------------------------------------------------------------- all.q@ultra20 BIP 0/4 0.12 sol-amd64 ############################################################################ - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ 21408 0.55500 Sleeper root qw 12/01/2008 10:32:02 2 -bash-3.2# qstat -j 21408 ============================================================== job_number: 21408 exec_file: job_scripts/21408 submission_time: Mon Dec 1 10:32:02 2008 owner: root uid: 0 group: root gid: 0 sge_o_home: /root sge_o_log_name: root sge_o_path: /usr/local/sge6.2/bin/sol-amd64:/usr/sbin:/usr/bin sge_o_shell: /usr/bin/bash sge_o_tz: US/Pacific sge_o_workdir: /root sge_o_host: ultra20 account: sge mail_list: root@ultra20 notify: FALSE job_name: Sleeper jobshare: 0 shell_list: /bin/sh env_list: script_file: /usr/local/sge6.2/examples/jobs/sleeper.sh parallel environment: 3per range: 2 scheduling info: cannot run in PE "3per" because it only offers 0 slots I have a PE with an int allocation rule. Unless I submit jobs that request a multiple of that int, the jobs won't be scheduled. Instead, it should be possible to submit a job with, for example, -pe 3per 4, and get three slaves on one host and 1 slave on another. This is a P1 because a very important customer wants it. Now. Please. Thank you. See the CR for details. ------- Additional comments from reuti Thu Dec 4 12:30:52 -0700 2008 ------- Why not using $fill_up in this case? Otherwise you could also say to get 4 slots on four different hosts, which makes the allocation rule useless. Or: you want "int" interpreted as a maximum per node, or per job? This could be done in an RQS I think. That it's not refused right now, could be checked in the coming qsub submit-filter. ------- Additional comments from templedf Thu Dec 4 12:52:35 -0700 2008 ------- If I use $fill_up with 4-core machines, then "qsub -pe fillup 4" gets me 4 on one machines. I want 3 on one machine and 1 on another. The solution you're suggesting with RQS is interesting, but doesn't quite solve the problem. I want there to be exactly <n> slaves per host, except for the last host which will get <t % n> hosts. The fillup/RQS solution will give me no more than <n> per host, but could give me less. ------- Additional comments from reuti Thu Dec 4 13:06:38 -0700 2008 ------- I see, but you still want to use the 4th core on the 1st machine for other jobs? Otherwise the queue could have just 3 slots which would give you the 3+1 when the 1st node is empty at the beginning. What shall happen on the second node from which you want one slot - use the remaining slots for other jobs or shall this node be blocked for further usage? ------- Additional comments from templedf Thu Dec 4 13:25:36 -0700 2008 ------- Limiting the slots by queue won't work because it must be possible to run 2per, 3per, 4per, etc. all on the same queue. The slots in the queue that aren't used by the PE job should be available for other (PE or non-PE) jobs to use. ------- Additional comments from reuti Thu Dec 4 15:03:12 -0700 2008 ------- Then I would suggest to give this allocation rule a new syntax. For now I used the fixed allocation rule to direct jobs divisible by 2 to one part of the cluster (hence allocation rule 2), and odd number of slots to the other part. SGE was able to discover that a request for 7 can't be satisfied by allocation rule 2. With the requested behavior, this would mean that I could get 2+2+2+1 (yes, in my setup even slot count jobs were allowed to get 1+1+1+1 - but on all nodes the same number of slots was the necessity).
Change History (1)
comment:1 Changed 10 years ago by dlove
- Priority changed from highest to normal
- Severity set to minor
Note: See
TracTickets for help on using
tickets.