[GE dev] Reservation

andy andy.schwierskott at sun.com
Thu May 20 07:20:45 BST 2010


Hi,

> in our installation, reservations (the -R y kind, not advance
> reservations) work, at best, intermittently. Therefore, I've generated
> some debug traces and looked through the code (V62u5_TAG). However, there
> is one point that is not quite clear to me:
>
> in schedd/scheduler/dispatch_jobs(), the list of queues is filtered in
> several steps. At one point, sge_split_queue_load(..., QU_load_thresholds)
> removes overloaded queues. As far as I can see, these are never re-added,
> so everything dependent on the queue list will have to work with the
> reduced list of non-overloaded queues. In
> sge_select_queue/parallel_tag_queues_suitable4job(), this queue list is
> used to count the number of available slots at some point in the future in
> order to make a reservation. Thus, it seems that queues overloaded *now*
> will be ignored when considering a situation in the future. What am I
> overlooking here?

The behavior you observe and what you've found out in the code is correct -
at least that's how it's designed and intended to work. The motivation is
that it's not known when an alarm state goes away - so it's not much
different from a queue being in "unknown" state (I think the Advance
Reservation does not select queues in alarm state as well). This behavior
becomes problematic when you configured your SGE cluster that alarm states
are "normal" when the queues are fully busy. In SGE's philosophy an alarm
state is an exceptional, non-normal state - the goal should be to define
queue slots as upper limits and when all queues slots are used and jobs are
busy queues are not in alarm state.

I fully understand there can be quite different views on this behavior:-)

Changing the scheduler code would be likely non-trivial since as you see
from the code reservations are done during a normal scheduling run when non
reservation jobs are dispatched as well - of course you don't want jobs are
dispatched to queues which are in alarm state.

Andy

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=39&dsMessageId=257940

To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list