[GE users] Resource reservation fails for large job

andreas andreas.haas at sun.com
Wed Jun 3 16:04:59 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Sabine,

On Wed, 3 Jun 2009, s_kreidl wrote:

> Hi Andreas,
>
> thanks for pointing that out, I didn't see the wood for the trees (sorry, literal German translation).
>
> However, the jobs are not submitted with any hard (or soft) runtime limit (no "hard resource_list: ..." at all for the small jobs). I checked again and the only setting for the jobs' duration is done within the queue configuration, see below.

these queue limit play no role in reservation calculation. Reason is that 
it can be different per queue and is thus not (always) known before jobs actually 
get their assignments.

> Where does SGE get the 70 seconds from?

I think this is from default_duration in sched_conf(5):

    http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/devel/rfe/resource_reservation.txt

to verify do a

    # qconf -ssconf | grep default_duration

> Even more irritating, the reservation times, e.g. 1244??742740(=Thu, 11 Jun 2009 17:52:20 GMT) indeed seem to take the queues soft runtime limit of 10 days into account.

Hm. I could look into this, but I need a more complete 'schedule' file.

>
> Does resource reservation only work, if the job has a fixed h_rt or s_rt value (if so which one?) provided with the -l option? Or must a corresponding complex be enforced? What exactly does one need to do to get resource reservation running?

Either way you need to have control over job wall-clock times. Otherwise any reservation scheduling is in vain.
As for a reliable default for each job you add -l h_rt=... or -l s_rt=... into the site-wide sge_request(5) file:

    $SGE_ROOT/default/common/sge_request

this default gets picked-up by any job that is submitted.

For overwriting the defaults the same options can be used at the qsub command line.

Regards,
Andreas

>
> Thanks again,
> Sabine
>
>
> Output of "qconf -sq par.q":
>
> qname                 par.q
> hostlist              @par_queue
> seq_no                0
> load_thresholds       np_load_avg=1.10
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:02:30
> priority              0
> min_cpu_interval      00:02:30
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               openmp openmpi-1perhost openmpi-2perhost \
>                      openmpi-4perhost openmpi-8perhost openmpi-fillup \
>                      openmpi-roundrobin
> rerun                 TRUE
> slots                 8
> tmpdir                /tmp
> shell                 /bin/bash
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        /usr/sge/bin/lx24-amd64/start.sh
> suspend_method        SIGTSTP
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            standard_users power_users
> xuser_lists           gr_cb01
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  240:00:00
> h_rt                  336:00:00
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200671
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200677

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list