[GE users] resource reservation not working

Ross Dickson Ross.Dickson at dal.ca
Fri Sep 21 01:54:32 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Daniel.

It does slightly resemble 2344, now that you mention it.  We have two 
queues on
this cluster:  One for parallel jobs, and a subordinate queue for serial jobs.

Does 2344 imply that reservation won't work properly for any site with 
multiple
cluster queues?  That seems a bit unlikely.  Hey, list readers, is anyone out
there running a site with multiple queues and has "-R y" working to their
satisfaction?

- Ross


Quoting Daniel Templeton <Dan.Templeton at Sun.COM>:

> Smells a bit like 2344:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2344
>
> Daniel
>
>>> On Wed, 19 Sep 2007, Ross Dickson wrote:
>>>
>>>> Hello all.
>>>>
>>>> We've got a Red Hat cluster running N1GE 6.0u9.  We've got 
>>>> resource reservation turned on:
>>>>
>>>> % qconf -ssconf | grep reservation
>>>> max_reservation                   5
>>>>
>>>> ...and four jobs in the waiting list with "-R y".  Here's one:
>>>>
>>>> % qstat -j 3568 | grep reserv
>>>> reserve:                    y
>>>>
>>>> But since it went in on Sept 14, other jobs (of lower priority!) 
>>>> have been submitted and scheduled. Here are some highlights from 
>>>> qstat:
>>>>
>>>> job-ID  prior   name       user         state submit/start at     
>>>> queue slots ja-task-ID
>>>> ----------------------------------------------------------------------------------------------------------------- 
>>>> ....
>>>>  3566 0.52079 rs1.90_cmc itamblyn     r     09/18/2007 11:04:59 
>>>> all.q at cl026.smu.acenet.ca          4
>>>>  3668 0.52079 L099A      mcoates      r     09/18/2007 12:27:44 
>>>> all.q at cl027.smu.acenet.ca          4
>>>>  3563 0.52079 rs1.90_cmc itamblyn     r     09/13/2007 15:52:23 
>>>> all.q at cl028.smu.acenet.ca          4
>>>>  3667 0.52079 L022       mcoates      r     09/18/2007 12:27:44 
>>>> all.q at cl029.smu.acenet.ca          4
>>>> ....
>>>>  3568 0.60500 Metis      kghazino     qw    09/14/2007 13:55:52 20
>>>> ....
>>>>
>>>> Note the start times on 3566, 3667, 3668.  When I set "params 
>>>> MONITOR=1" in qconf -msconf, I can see that 3568 is reserving cpus:
>>>>
>>>> % tail -3 /opt/n1ge6u9/default/common/schedule
>>>> 3568:1:RESERVING:1190217135:660:Q:all.q at cl021.smu.acenet.ca:slots:1.000000 3568:1:RESERVING:1190217135:660:Q:all.q at cl034.smu.acenet.ca:slots:1.000000 3568:1:RESERVING:1190217135:660:Q:all.q at cl020.smu.acenet.ca:slots:1.000000 This looks suspiciously like a case mentioned on this mailing list in Dec 2006 by Jean-Paul Minet, but no answer to his final query appears in the archives.  Why are the smaller jobs getting in front of the reserving job? What am I 
>>>> missing?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list