[GE dev] Reservation

andy andy.schwierskott at sun.com
Fri May 21 12:36:11 BST 2010


Hi,

> > The behavior you observe and what you've found out in the code is correct -
> > at least that's how it's designed and intended to work. The motivation is
> > that it's not known when an alarm state goes away - so it's not much
> > different from a queue being in "unknown" state (I think the Advance
> > Reservation does not select queues in alarm state as well). This behavior
> > becomes problematic when you configured your SGE cluster that alarm states
> > are "normal" when the queues are fully busy. In SGE's philosophy an alarm
> > state is an exceptional, non-normal state
>
>
> OK, thanks for pointing this out. I've now changed our configuration to
> avoid alarms during normal operation. The new configuration has been
> partly successful: the queue list passed down by dispatch_jobs() is now
> much more realistic. However, no reservations have been made during a test
> run of a few minutes' length. From a debug trace, I gathered the
> following: Upon qmaster startup, not all queues are present. When calling
> qstat -f immediately, several queues would appear with load "values" of
> N/A. Therefore, dispatch_jobs() decides that this job can never be run,
> and calls sge_reject_category(). In subsequent scheduling runs, no
> reservation is tried for that job. Will the CT_reservation_rejected flag
> be cleared at some point? I've grepped for it in the sources and could not
> find anything, but maybe there is a more devious mechanism at work?

I guess (I'm not a developer of this part of the code) the reason why you
don't see CT_reservation_rejected reset to 0 is because in every scheduling
run the scheduler is working on a new copy of most of the data structures.
I'd assume with every scheduler run this value is thus initialized properly.

When a host is in unknown state and becomes known, qmaster sends when it
triggers the next scheduler run all update events to the scheduler. This has
to trigger that the queue instances on this host wil be considered for the
next scheduling. In other words: reservations are not persistent across
scheduling runs - in every scheduling run the reservations are computed
again.

Do you see from the "schedule" file that no reservations are made for queue
on hosts which became "known" after qmaster has started? The "schedule" file
in which the reservation decisions are logged can be enabled by setting

  params MONITOR=1

in the scheduler configuration (sched_conf(5)).

Andy

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=39&dsMessageId=258081

To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list