[GE users] Advance reservation strange behavior

Jean-Paul Minet minet at cism.ucl.ac.be
Tue Jun 13 13:51:36 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Staphan,

> which version are you running? We had a bug with resource reservation jobs

We are using 6.0u6

> and the ticket calculation. It should only affect the qstat output and 
> not the
> startup order.

Well, since the time I posted, the job finally got some ticktes (and "climbed" 
in the qstat output), but I still see smalljobs getting launched without any CPU 
being put aside for the big fob.  Here is the qstat output:

lemaitre /gridware/sge/default/common # qstat
job-ID  prior   name       user         state submit/start at     queue 
                  slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   13411 0.00030 job_sub    morari       r     06/11/2006 00:32:19 
all.q at lmexec-100                   4
   13570 0.00124 phon121    sdubois      r     06/12/2006 22:36:59 
all.q at lmexec-101                   2
   13304 0.00011 run        hermet       r     06/09/2006 16:49:46 
all.q at lmexec-103                   4
   13372 0.00030 job_sub    morari       r     06/10/2006 19:14:32 
all.q at lmexec-103                   4
   13481 0.00230 R53Cd      detraux      r     06/12/2006 10:04:41 
all.q at lmexec-105                   1
   13236 0.01343 flexion_p7 acleerem     r     06/09/2006 05:04:40 
all.q at lmexec-108                   1
   13338 0.01343 flexion_pr acleerem     r     06/10/2006 00:07:28 
all.q at lmexec-112                   1
   13595 0.00030 alf_lda4C  morari       r     06/13/2006 10:46:51 
all.q at lmexec-116                   4
   13611 0.00124 phon1230   sdubois      r     06/13/2006 14:06:26 
all.q at lmexec-117                   2
   13346 0.01343 flexion_c4 acleerem     r     06/10/2006 05:03:48 
all.q at lmexec-119                   1
   13253 0.00011 run        hermet       r     06/09/2006 10:32:41 
all.q at lmexec-122                   4
   13511 0.01124 C3-3       gmossoux     r     06/12/2006 12:56:14 
all.q at lmexec-123                   1
   13500 0.00011 run        hermet       r     06/12/2006 10:52:26 
all.q at lmexec-125                   4
   13502 0.00030 co_dtp     morari       r     06/12/2006 12:00:34 
all.q at lmexec-126                   4
   13608 0.27512 lance.P8   demontet     r     06/13/2006 13:32:56 
all.q at lmexec-129                   1
   13578 0.11046 PF3        peeters      r     06/13/2006 08:19:29 
all.q at lmexec-6                     4
   13580 0.11046 PHF2       peeters      r     06/13/2006 08:59:01 
all.q at lmexec-6                     4
   13223 0.02026 scriptabin gruning      r     06/08/2006 17:46:24 
all.q at lmexec-62                    1
   13613 0.00095 jn4m8f_Rel bousquet     r     06/13/2006 14:16:18 
all.q at lmexec-64                    4
   13602 0.02026 scriptgw   gruning      r     06/13/2006 12:13:18 
all.q at lmexec-65                    8
   13581 0.00124 def124     sdubois      r     06/13/2006 09:20:48 
all.q at lmexec-67                    4
   13614 0.00133 jobQ14     gmatteo      r     06/13/2006 14:37:10 
all.q at lmexec-69                    6
   13520 0.00230 R.3.DD     detraux      r     06/12/2006 14:03:42 
all.q at lmexec-7                     1
   13598 0.00124 trans      sdubois      r     06/13/2006 11:27:25 
all.q at lmexec-7                     2
   13612 0.00124 trans      sdubois      r     06/13/2006 14:08:04 
all.q at lmexec-7                     4
   13410 0.00030 job_sub    morari       r     06/10/2006 22:52:54 
all.q at lmexec-70                    4
   13569 0.00124 phon123    sdubois      r     06/12/2006 22:36:59 
all.q at lmexec-71                    2
   13337 0.04108 E50.inp    ledur        r     06/09/2006 23:55:57 
all.q at lmexec-73                    1
   13601 1.00000 sub.titi1  driess       r     06/13/2006 12:05:49 
all.q at lmexec-73                    1
   12866 0.00030 job_sub    morari       r     06/05/2006 14:31:49 
all.q at lmexec-76                    4
   12984 0.04108 AB-C-357-S ledur        r     06/06/2006 15:52:56 
all.q at lmexec-81                    1
   13368 0.00030 os_dtp     morari       r     06/10/2006 13:27:31 
all.q at lmexec-82                    4
   13371 0.00030 job_sub    morari       r     06/10/2006 15:54:23 
all.q at lmexec-84                    4
   13512 0.01124 C3-2       gmossoux     r     06/12/2006 12:56:26 
all.q at lmexec-85                    1
   13254 0.00011 run        hermet       r     06/09/2006 10:36:34 
all.q at lmexec-86                    4
   13436 0.00030 rudtp_C    morari       r     06/11/2006 10:50:58 
all.q at lmexec-86                    4
   13486 0.00230 R56Te      detraux      r     06/12/2006 10:32:00 
all.q at lmexec-90                    1
   13558 0.00133 jobGL_20   gmatteo      r     06/13/2006 11:57:43 
all.q at lmexec-93                   12
   13484 0.00230 R55TeMOD   detraux      r     06/12/2006 10:20:38 
all.q at lmexec-95                    1
   13485 0.00230 R56Cd      detraux      r     06/12/2006 10:20:38 
all.q at lmexec-95                    1
   13571 0.00124 phon120    sdubois      r     06/12/2006 22:53:58 
all.q at lmexec-96                    2
   13482 0.00230 R54Te      detraux      r     06/12/2006 10:20:38 
all.q at lmexec-98                    1
   13483 0.00230 R55Cd      detraux      r     06/12/2006 10:20:38 
all.q at lmexec-98                    1
   13606 0.01447 scriptgw   gruning      qw    06/13/2006 12:34:26 
                       8
   13276 0.00994 VICPFM     cocle        qw    06/09/2006 12:41:15 
                      12
   13391 0.00659 gga_sio2   shaltaf      qw    06/10/2006 15:08:46 
                       8
   13266 0.00649 VICPFM     cocle        qw    06/09/2006 11:31:08 
                      40
   13393 0.00330 sio2_lda_s shaltaf      qw    06/10/2006 15:18:05 
                       8
   13394 0.00220 sio2_lda_o shaltaf      qw    06/10/2006 15:18:51 
                       8
   13395 0.00165 sio2_gga_s shaltaf      qw    06/10/2006 15:22:09 
                       8
   13396 0.00132 sio2_gga_o shaltaf      qw    06/10/2006 15:22:11 
                       8

Jobs 13276 and 13266 have both requested reservation, but many jobs with lower 
priority were started (13483, 13571, 13485,...)

Any clue?

jp

> An upgrade to u8 might be good idea.

Will do normally tomorrow

Jean-Paul

> Cheers,
> Stephan
> 
> Jean-Paul Minet wrote:
> 
>>>   Hi Jean-Paul,
>>>
>>>     When you submit this jobs do you set the option "-R y" ?
>>
>>
>>
>> Yes, I did.  A "qstat -j job_id" shows that SGE picked up correctly 
>> this option.
>>
>> jp
>>
>>>                                                   Regards
>>>
>>> On Mon, 12 Jun 2006 16:19:14 +0200
>>> Jean-Paul Minet <minet at cism.ucl.ac.be> wrote:
>>>
>>>
>>>> Craig,
>>>>
>>>> [...]
>>>>
>>>>
>>>>>> Submitting a job with advanced reservation (test_ar), it doesn't 
>>>>>> get allocated any priority (tickets only, based on fair share only):
>>>>>>
>>>>>> job-ID  prior   nurg    npprior ntckts   ppri name       
>>>>>> user         state submit/start at     
>>>>>> queue                          slots ja-task-ID
>>>>>> ---------------------------------------------------------------------------- 
>>>>>>
>>>>>> [...]
>>>>>> 13149 0.00000 0.00000 0.00000 0.00000     0 test_ar    
>>>>>> root         qw 06/08/2006 
>>>>>> 11:46:48                                   20
>>>>>>
>>>>>> If I then submit the same job but without advanced reservation 
>>>>>> (test), it gets its normal priority, while the advanced 
>>>>>> reservation job remains at 0 priority.
>>>>>
>>>>>
>>>>
>>>> [...]
>>>>
>>>>
>>>>> SGE does not support the gathering of resources prior to runtime to
>>>>> ensure that the code runs at the time requested (advanced 
>>>>> reservations).
>>>>> It does support the running of code at a requested time, and if at 
>>>>> that
>>>>> time the job does not have enough resources it will aggressively 
>>>>> gather those resources until it can run (reservations).
>>>>
>>>>
>>>>
>>>> Sorry, I probably misexpressed myself...  I was talking about 
>>>> "resource reservation", i.e. preventing lower priority small (in 
>>>> terms of slots) jobs to run before jobs requiring a higher number of 
>>>> slots.  It is not a question of specific time for the job to run, 
>>>> but to respect priority order, keeping CPUs aside as soon as they 
>>>> get free to gather enough of them to enable a big job to run 
>>>> (instead of dispatching small jobs).
>>>>
>>>> I am wrong somewhere?
>>>>
>>>> Jean-Paul
>>>>
>>>>
>>>>> What time was the job scheduled to run?  I thought that the priority
>>>>> for jobs with reservations stays zero until after the start time 
>>>>> has passed.
>>>>>
>>>>> Craig
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Thanks for any info/help
>>>>>>
>>>>>> Jean-paul
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> Jean-Paul Minet
>>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
>>>> Université Catholique de Louvain
>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>>
>>>
>>> - -- ============================================
>>>  Rui Manuel dos Santos Ramos
>>>
>>>  Instituto de Recursos e Iniciativas Comuns
>>>  Praca Gomes Teixeira, 4099-002 Porto, Portugal
>>>
>>>  phone : +351 223 401 571
>>>  e-mail: rramos[at]iric.up.pt
>>>     web: http://ruiramos.homeip.net
>>> ============================================
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>>
>>> iQEVAwUBRI17BL1uR0bdnTWSAQL/cQgAhNnXqoM9R1ohIZdbjScKcigzNg9Q+ZtW
>>> qI+oDcS611sVhDLJYFKoqF7XIQRXsjKVVpWwkhpjGXQOOSihdMNzxEAFOzXFqhJm
>>> 9awhHwCwM4Q+RQBiIIv6a4jgnzn/oTGa9xi9lNxU63Ni2i3V8Bsb/nyGDTj3Uy/U
>>> RFdKdywmm83qmWTi1i81IqpfIqqiFzl1ogvSab2XEh79kPVXrGkiOixI9Z1tj0PQ
>>> sYFPBAul01ijVmXp60OHAuw8K2i4AGc4fI/mtkueT8Dmfy0rNPF/tJsD1z6xEUZv
>>> 1kaRQ9sU5KoxGHyHGuf1DiLzsImqJT8cG6ZnmEIIwt+t5vIqOTu55w==
>>> =MWWq
>>> -----END PGP SIGNATURE-----
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 

-- 
Jean-Paul Minet
Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
Université Catholique de Louvain
Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list