[GE users] Advance reservation strange behavior

Andreas Haas Andreas.Haas at Sun.COM
Tue Jun 13 16:00:19 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Have you tried to use sched_conf(5) MONITOR in sched_param ?
It allows you monitoring the reservation your job got and
how it changes over time.

Also, how about your DURATION_OFFSET? It was introduced in 6.0u5

   http://gridengine.sunsource.net/issues/show_bug.cgi?id=1662

to deal with prolog/epilog overhead.

Regards,
Andreas



On Tue, 13 Jun 2006, Jean-Paul Minet wrote:

> Staphan,
>
> > which version are you running? We had a bug with resource reservation jobs
>
> We are using 6.0u6
>
> > and the ticket calculation. It should only affect the qstat output and
> > not the
> > startup order.
>
> Well, since the time I posted, the job finally got some ticktes (and "climbed"
> in the qstat output), but I still see smalljobs getting launched without any CPU
> being put aside for the big fob.  Here is the qstat output:
>
> lemaitre /gridware/sge/default/common # qstat
> job-ID  prior   name       user         state submit/start at     queue
>                   slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>    13411 0.00030 job_sub    morari       r     06/11/2006 00:32:19
> all.q at lmexec-100                   4
>    13570 0.00124 phon121    sdubois      r     06/12/2006 22:36:59
> all.q at lmexec-101                   2
>    13304 0.00011 run        hermet       r     06/09/2006 16:49:46
> all.q at lmexec-103                   4
>    13372 0.00030 job_sub    morari       r     06/10/2006 19:14:32
> all.q at lmexec-103                   4
>    13481 0.00230 R53Cd      detraux      r     06/12/2006 10:04:41
> all.q at lmexec-105                   1
>    13236 0.01343 flexion_p7 acleerem     r     06/09/2006 05:04:40
> all.q at lmexec-108                   1
>    13338 0.01343 flexion_pr acleerem     r     06/10/2006 00:07:28
> all.q at lmexec-112                   1
>    13595 0.00030 alf_lda4C  morari       r     06/13/2006 10:46:51
> all.q at lmexec-116                   4
>    13611 0.00124 phon1230   sdubois      r     06/13/2006 14:06:26
> all.q at lmexec-117                   2
>    13346 0.01343 flexion_c4 acleerem     r     06/10/2006 05:03:48
> all.q at lmexec-119                   1
>    13253 0.00011 run        hermet       r     06/09/2006 10:32:41
> all.q at lmexec-122                   4
>    13511 0.01124 C3-3       gmossoux     r     06/12/2006 12:56:14
> all.q at lmexec-123                   1
>    13500 0.00011 run        hermet       r     06/12/2006 10:52:26
> all.q at lmexec-125                   4
>    13502 0.00030 co_dtp     morari       r     06/12/2006 12:00:34
> all.q at lmexec-126                   4
>    13608 0.27512 lance.P8   demontet     r     06/13/2006 13:32:56
> all.q at lmexec-129                   1
>    13578 0.11046 PF3        peeters      r     06/13/2006 08:19:29
> all.q at lmexec-6                     4
>    13580 0.11046 PHF2       peeters      r     06/13/2006 08:59:01
> all.q at lmexec-6                     4
>    13223 0.02026 scriptabin gruning      r     06/08/2006 17:46:24
> all.q at lmexec-62                    1
>    13613 0.00095 jn4m8f_Rel bousquet     r     06/13/2006 14:16:18
> all.q at lmexec-64                    4
>    13602 0.02026 scriptgw   gruning      r     06/13/2006 12:13:18
> all.q at lmexec-65                    8
>    13581 0.00124 def124     sdubois      r     06/13/2006 09:20:48
> all.q at lmexec-67                    4
>    13614 0.00133 jobQ14     gmatteo      r     06/13/2006 14:37:10
> all.q at lmexec-69                    6
>    13520 0.00230 R.3.DD     detraux      r     06/12/2006 14:03:42
> all.q at lmexec-7                     1
>    13598 0.00124 trans      sdubois      r     06/13/2006 11:27:25
> all.q at lmexec-7                     2
>    13612 0.00124 trans      sdubois      r     06/13/2006 14:08:04
> all.q at lmexec-7                     4
>    13410 0.00030 job_sub    morari       r     06/10/2006 22:52:54
> all.q at lmexec-70                    4
>    13569 0.00124 phon123    sdubois      r     06/12/2006 22:36:59
> all.q at lmexec-71                    2
>    13337 0.04108 E50.inp    ledur        r     06/09/2006 23:55:57
> all.q at lmexec-73                    1
>    13601 1.00000 sub.titi1  driess       r     06/13/2006 12:05:49
> all.q at lmexec-73                    1
>    12866 0.00030 job_sub    morari       r     06/05/2006 14:31:49
> all.q at lmexec-76                    4
>    12984 0.04108 AB-C-357-S ledur        r     06/06/2006 15:52:56
> all.q at lmexec-81                    1
>    13368 0.00030 os_dtp     morari       r     06/10/2006 13:27:31
> all.q at lmexec-82                    4
>    13371 0.00030 job_sub    morari       r     06/10/2006 15:54:23
> all.q at lmexec-84                    4
>    13512 0.01124 C3-2       gmossoux     r     06/12/2006 12:56:26
> all.q at lmexec-85                    1
>    13254 0.00011 run        hermet       r     06/09/2006 10:36:34
> all.q at lmexec-86                    4
>    13436 0.00030 rudtp_C    morari       r     06/11/2006 10:50:58
> all.q at lmexec-86                    4
>    13486 0.00230 R56Te      detraux      r     06/12/2006 10:32:00
> all.q at lmexec-90                    1
>    13558 0.00133 jobGL_20   gmatteo      r     06/13/2006 11:57:43
> all.q at lmexec-93                   12
>    13484 0.00230 R55TeMOD   detraux      r     06/12/2006 10:20:38
> all.q at lmexec-95                    1
>    13485 0.00230 R56Cd      detraux      r     06/12/2006 10:20:38
> all.q at lmexec-95                    1
>    13571 0.00124 phon120    sdubois      r     06/12/2006 22:53:58
> all.q at lmexec-96                    2
>    13482 0.00230 R54Te      detraux      r     06/12/2006 10:20:38
> all.q at lmexec-98                    1
>    13483 0.00230 R55Cd      detraux      r     06/12/2006 10:20:38
> all.q at lmexec-98                    1
>    13606 0.01447 scriptgw   gruning      qw    06/13/2006 12:34:26
>                        8
>    13276 0.00994 VICPFM     cocle        qw    06/09/2006 12:41:15
>                       12
>    13391 0.00659 gga_sio2   shaltaf      qw    06/10/2006 15:08:46
>                        8
>    13266 0.00649 VICPFM     cocle        qw    06/09/2006 11:31:08
>                       40
>    13393 0.00330 sio2_lda_s shaltaf      qw    06/10/2006 15:18:05
>                        8
>    13394 0.00220 sio2_lda_o shaltaf      qw    06/10/2006 15:18:51
>                        8
>    13395 0.00165 sio2_gga_s shaltaf      qw    06/10/2006 15:22:09
>                        8
>    13396 0.00132 sio2_gga_o shaltaf      qw    06/10/2006 15:22:11
>                        8
>
> Jobs 13276 and 13266 have both requested reservation, but many jobs with lower
> priority were started (13483, 13571, 13485,...)
>
> Any clue?
>
> jp
>
> > An upgrade to u8 might be good idea.
>
> Will do normally tomorrow
>
> Jean-Paul
>
> > Cheers,
> > Stephan
> >
> > Jean-Paul Minet wrote:
> >
> >>>   Hi Jean-Paul,
> >>>
> >>>     When you submit this jobs do you set the option "-R y" ?
> >>
> >>
> >>
> >> Yes, I did.  A "qstat -j job_id" shows that SGE picked up correctly
> >> this option.
> >>
> >> jp
> >>
> >>>                                                   Regards
> >>>
> >>> On Mon, 12 Jun 2006 16:19:14 +0200
> >>> Jean-Paul Minet <minet at cism.ucl.ac.be> wrote:
> >>>
> >>>
> >>>> Craig,
> >>>>
> >>>> [...]
> >>>>
> >>>>
> >>>>>> Submitting a job with advanced reservation (test_ar), it doesn't
> >>>>>> get allocated any priority (tickets only, based on fair share only):
> >>>>>>
> >>>>>> job-ID  prior   nurg    npprior ntckts   ppri name
> >>>>>> user         state submit/start at
> >>>>>> queue                          slots ja-task-ID
> >>>>>> ----------------------------------------------------------------------------
> >>>>>>
> >>>>>> [...]
> >>>>>> 13149 0.00000 0.00000 0.00000 0.00000     0 test_ar
> >>>>>> root         qw 06/08/2006
> >>>>>> 11:46:48                                   20
> >>>>>>
> >>>>>> If I then submit the same job but without advanced reservation
> >>>>>> (test), it gets its normal priority, while the advanced
> >>>>>> reservation job remains at 0 priority.
> >>>>>
> >>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>>
> >>>>> SGE does not support the gathering of resources prior to runtime to
> >>>>> ensure that the code runs at the time requested (advanced
> >>>>> reservations).
> >>>>> It does support the running of code at a requested time, and if at
> >>>>> that
> >>>>> time the job does not have enough resources it will aggressively
> >>>>> gather those resources until it can run (reservations).
> >>>>
> >>>>
> >>>>
> >>>> Sorry, I probably misexpressed myself...  I was talking about
> >>>> "resource reservation", i.e. preventing lower priority small (in
> >>>> terms of slots) jobs to run before jobs requiring a higher number of
> >>>> slots.  It is not a question of specific time for the job to run,
> >>>> but to respect priority order, keeping CPUs aside as soon as they
> >>>> get free to gather enough of them to enable a big job to run
> >>>> (instead of dispatching small jobs).
> >>>>
> >>>> I am wrong somewhere?
> >>>>
> >>>> Jean-Paul
> >>>>
> >>>>
> >>>>> What time was the job scheduled to run?  I thought that the priority
> >>>>> for jobs with reservations stays zero until after the start time
> >>>>> has passed.
> >>>>>
> >>>>> Craig
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Thanks for any info/help
> >>>>>>
> >>>>>> Jean-paul
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Jean-Paul Minet
> >>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
> >>>> Université Catholique de Louvain
> >>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> - -- ============================================
> >>>  Rui Manuel dos Santos Ramos
> >>>
> >>>  Instituto de Recursos e Iniciativas Comuns
> >>>  Praca Gomes Teixeira, 4099-002 Porto, Portugal
> >>>
> >>>  phone : +351 223 401 571
> >>>  e-mail: rramos[at]iric.up.pt
> >>>     web: http://ruiramos.homeip.net
> >>> ============================================
> >>>
> >>> -----BEGIN PGP SIGNATURE-----
> >>> Version: GnuPG v1.4.2.2 (GNU/Linux)
> >>>
> >>> iQEVAwUBRI17BL1uR0bdnTWSAQL/cQgAhNnXqoM9R1ohIZdbjScKcigzNg9Q+ZtW
> >>> qI+oDcS611sVhDLJYFKoqF7XIQRXsjKVVpWwkhpjGXQOOSihdMNzxEAFOzXFqhJm
> >>> 9awhHwCwM4Q+RQBiIIv6a4jgnzn/oTGa9xi9lNxU63Ni2i3V8Bsb/nyGDTj3Uy/U
> >>> RFdKdywmm83qmWTi1i81IqpfIqqiFzl1ogvSab2XEh79kPVXrGkiOixI9Z1tj0PQ
> >>> sYFPBAul01ijVmXp60OHAuw8K2i4AGc4fI/mtkueT8Dmfy0rNPF/tJsD1z6xEUZv
> >>> 1kaRQ9sU5KoxGHyHGuf1DiLzsImqJT8cG6ZnmEIIwt+t5vIqOTu55w==
> >>> =MWWq
> >>> -----END PGP SIGNATURE-----
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>>
> >>>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> >
>
> --
> Jean-Paul Minet
> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
> Université Catholique de Louvain
> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list