[GE users] resource reservation not working

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Wed Sep 19 17:49:54 BST 2007


Hi Ross,

are you sure 3568 had higher priority also at the time when these smaller 
jobs were assigned? Could it be 3568 got no reservation in the meantime 
due to small max_reservation of 5? How you ensure jobs like 3568 get high 
priority? I would expect you are using urgency contribution of 1000 for the
'slots' resource. Is there any other resource with a significant urgency 
contribution?

What weights are you using for priorities:

  # qconf -ssconf | egrep "weight_urgency|weight_priority|weight_ticket"

Regards,
Andreas


On Wed, 19 Sep 2007, Ross Dickson wrote:

> Hello all.
>
> We've got a Red Hat cluster running N1GE 6.0u9.  We've got resource 
> reservation turned on:
>
> % qconf -ssconf | grep reservation
> max_reservation                   5
>
> ...and four jobs in the waiting list with "-R y".  Here's one:
>
> % qstat -j 3568 | grep reserv
> reserve:                    y
>
> But since it went in on Sept 14, other jobs (of lower priority!) have been 
> submitted and scheduled. 
> Here are some highlights from qstat:
>
> job-ID  prior   name       user         state submit/start at     queue 
> slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
> ....
>  3566 0.52079 rs1.90_cmc itamblyn     r     09/18/2007 11:04:59 
> all.q at cl026.smu.acenet.ca          4
>  3668 0.52079 L099A      mcoates      r     09/18/2007 12:27:44 
> all.q at cl027.smu.acenet.ca          4
>  3563 0.52079 rs1.90_cmc itamblyn     r     09/13/2007 15:52:23 
> all.q at cl028.smu.acenet.ca          4
>  3667 0.52079 L022       mcoates      r     09/18/2007 12:27:44 
> all.q at cl029.smu.acenet.ca          4
> ....
>  3568 0.60500 Metis      kghazino     qw    09/14/2007 13:55:52 
> 20
> ....
>
> Note the start times on 3566, 3667, 3668.  When I set "params MONITOR=1" in 
> qconf -msconf, I can see that 3568 is reserving cpus:
>
> % tail -3 /opt/n1ge6u9/default/common/schedule
> 3568:1:RESERVING:1190217135:660:Q:all.q at cl021.smu.acenet.ca:slots:1.000000
> 3568:1:RESERVING:1190217135:660:Q:all.q at cl034.smu.acenet.ca:slots:1.000000
> 3568:1:RESERVING:1190217135:660:Q:all.q at cl020.smu.acenet.ca:slots:1.000000
>
> This looks suspiciously like a case mentioned on this mailing list in Dec 
> 2006 by Jean-Paul Minet, but no answer to his final query appears in the 
> archives.  Why are the smaller jobs getting in front of the reserving job? 
> What am I missing? 
>
> -- 
> Ross Dickson         HPC Consultant
> ACEnet               http://www.ace-net.ca
> +1 902 494 6710      Skype: ross.m.dickson
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list