[GE users] Resource reservation fails for large job
andreas.haas at sun.com
Wed Jun 3 16:04:59 BST 2009
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
On Wed, 3 Jun 2009, s_kreidl wrote:
> Hi Andreas,
> thanks for pointing that out, I didn't see the wood for the trees (sorry, literal German translation).
> However, the jobs are not submitted with any hard (or soft) runtime limit (no "hard resource_list: ..." at all for the small jobs). I checked again and the only setting for the jobs' duration is done within the queue configuration, see below.
these queue limit play no role in reservation calculation. Reason is that
it can be different per queue and is thus not (always) known before jobs actually
get their assignments.
> Where does SGE get the 70 seconds from?
I think this is from default_duration in sched_conf(5):
to verify do a
# qconf -ssconf | grep default_duration
> Even more irritating, the reservation times, e.g. 1244??742740(=Thu, 11 Jun 2009 17:52:20 GMT) indeed seem to take the queues soft runtime limit of 10 days into account.
Hm. I could look into this, but I need a more complete 'schedule' file.
> Does resource reservation only work, if the job has a fixed h_rt or s_rt value (if so which one?) provided with the -l option? Or must a corresponding complex be enforced? What exactly does one need to do to get resource reservation running?
Either way you need to have control over job wall-clock times. Otherwise any reservation scheduling is in vain.
As for a reliable default for each job you add -l h_rt=... or -l s_rt=... into the site-wide sge_request(5) file:
this default gets picked-up by any job that is submitted.
For overwriting the defaults the same options can be used at the qsub command line.
> Thanks again,
> Output of "qconf -sq par.q":
> qname par.q
> hostlist @par_queue
> seq_no 0
> load_thresholds np_load_avg=1.10
> suspend_thresholds NONE
> nsuspend 1
> suspend_interval 00:02:30
> priority 0
> min_cpu_interval 00:02:30
> processors UNDEFINED
> qtype BATCH INTERACTIVE
> ckpt_list NONE
> pe_list openmp openmpi-1perhost openmpi-2perhost \
> openmpi-4perhost openmpi-8perhost openmpi-fillup \
> rerun TRUE
> slots 8
> tmpdir /tmp
> shell /bin/bash
> prolog NONE
> epilog NONE
> shell_start_mode posix_compliant
> starter_method /usr/sge/bin/lx24-amd64/start.sh
> suspend_method SIGTSTP
> resume_method NONE
> terminate_method NONE
> notify 00:00:60
> owner_list NONE
> user_lists standard_users power_users
> xuser_lists gr_cb01
> subordinate_list NONE
> complex_values NONE
> projects NONE
> xprojects NONE
> calendar NONE
> initial_state default
> s_rt 240:00:00
> h_rt 336:00:00
> s_cpu INFINITY
> h_cpu INFINITY
> s_fsize INFINITY
> h_fsize INFINITY
> s_data INFINITY
> h_data INFINITY
> s_stack INFINITY
> h_stack INFINITY
> s_core INFINITY
> h_core INFINITY
> s_rss INFINITY
> h_rss INFINITY
> s_vmem INFINITY
> h_vmem INFINITY
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users