[GE dev] Reservation
Ansgar.Esztermann at mpi-bpc.mpg.de
Fri May 21 13:47:00 BST 2010
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
>> N/A. Therefore, dispatch_jobs() decides that this job can never be run,
>> and calls sge_reject_category(). In subsequent scheduling runs, no
>> reservation is tried for that job. Will the CT_reservation_rejected flag
>> be cleared at some point? I've grepped for it in the sources and could not
>> find anything, but maybe there is a more devious mechanism at work?
> I guess (I'm not a developer of this part of the code) the reason why you
> don't see CT_reservation_rejected reset to 0 is because in every scheduling
> run the scheduler is working on a new copy of most of the data structures.
> I'd assume with every scheduler run this value is thus initialized properly.
As far as I understand, this particular flag is meant to live through more than one scheduling run. Thus, sge_reset_job_category() dies reset the CT_rejected flag (a job category that could not start last run may well do so now), but not CT_reservation_rejected (if a reservation could not be found last run, neither can one be found this run -- unless something substantial changes, such as more resources being added to the cluster).
> When a host is in unknown state and becomes known, qmaster sends when it
> triggers the next scheduler run all update events to the scheduler. This has
> to trigger that the queue instances on this host wil be considered for the
> next scheduling. In other words: reservations are not persistent across
> scheduling runs - in every scheduling run the reservations are computed
True, unless CT_reservation_rejected is set for a certain job category, causing dispatch_jobs() to disable reservations.
> Do you see from the "schedule" file that no reservations are made for queue
> on hosts which became "known" after qmaster has started? The "schedule" file
Not quite. I see no reservations for a job submitted before qmaster has been restarted. I have then submitted another copy of the same job -- still no reservations. Finally, I have submitted a slightly altered jobfile, increasing the slot request from 128 to 132. This time, I do see reservations. This is the behaviour I would expect from a failure to reset the CT_reservation_rejected flag: the second job shares its cetegory with the first (thus no reservation), whereas the third requires different resources, thus defining a new category.
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users