[GE users] Resource reservation fails for large job

dougalb dougal.lists at gmail.com
Wed Jun 3 15:48:13 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Sabine,

I am very curious about this as well. We have a lot of small jobs
filling the queues and not letting the big parallel jobs through. Have
you been able to test with h_rt set for the small jobs?

-Dougal

On Wed, Jun 3, 2009 at 4:26 PM, s_kreidl <sabine.kreidl at uibk.ac.at> wrote:
> Hi Andreas,
>
> thanks for pointing that out, I didn't see the wood for the trees (sorry, literal German translation).
>
> However, the jobs are not submitted with any hard (or soft) runtime limit (no "hard resource_list: ..." at all for the small jobs). I checked again and the only setting for the jobs' duration is done within the queue configuration, see below.
> Where does SGE get the 70 seconds from?
>
> Even more irritating, the reservation times, e.g. 1244742740(=Thu, 11 Jun 2009 17:52:20 GMT) indeed seem to take the queues soft runtime limit of 10 days into account.
>
> Does resource reservation only work, if the job has a fixed h_rt or s_rt value (if so which one?) provided with the -l option? Or must a corresponding complex be enforced? What exactly does one need to do to get resource reservation running?
>
> Thanks again,
> Sabine
>
>
> Output of "qconf -sq par.q":
>
> qname                 par.q
> hostlist              @par_queue
> seq_no                0
> load_thresholds       np_load_avg=1.10
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:02:30
> priority              0
> min_cpu_interval      00:02:30
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               openmp openmpi-1perhost openmpi-2perhost \
>                      openmpi-4perhost openmpi-8perhost openmpi-fillup \
>                      openmpi-roundrobin
> rerun                 TRUE
> slots                 8
> tmpdir                /tmp
> shell                 /bin/bash
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        /usr/sge/bin/lx24-amd64/start.sh
> suspend_method        SIGTSTP
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            standard_users power_users
> xuser_lists           gr_cb01
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  240:00:00
> h_rt                  336:00:00
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200671
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200673

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list