[GE dev] Reservation

aeszter Ansgar.Esztermann at mpi-bpc.mpg.de
Thu May 20 17:10:04 BST 2010

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


On May 20, 2010, at 8:20 , andy wrote:
> The behavior you observe and what you've found out in the code is correct -
> at least that's how it's designed and intended to work. The motivation is
> that it's not known when an alarm state goes away - so it's not much
> different from a queue being in "unknown" state (I think the Advance
> Reservation does not select queues in alarm state as well). This behavior
> becomes problematic when you configured your SGE cluster that alarm states
> are "normal" when the queues are fully busy. In SGE's philosophy an alarm
> state is an exceptional, non-normal state

OK, thanks for pointing this out. I've now changed our configuration to avoid alarms during normal operation. The new configuration has been partly successful: the queue list passed down by dispatch_jobs() is now much more realistic. However, no reservations have been made during a test run of a few minutes' length. From a debug trace, I gathered the following:
Upon qmaster startup, not all queues are present. When calling qstat -f immediately, several queues would appear with load "values" of N/A. Therefore, dispatch_jobs() decides that this job can never be run, and calls sge_reject_category(). In subsequent scheduling runs, no reservation is tried for that job. Will the CT_reservation_rejected flag be cleared at some point? I've grepped for it in the sources and could not find anything, but maybe there is a more devious mechanism at work?



Ansgar Esztermann
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105


To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list