[GE users] resource reservation not working

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Wed Sep 26 15:28:54 BST 2007


On Wed, 26 Sep 2007, Ross Dickson wrote:

> Hello, Grid Engineers.
>
> I think I was mistaken about reservation not working --- It just doesn't
> work the way I thought it would.  What I expected was that, as resources
> (slots) came free, the scheduler would set them aside for the reserving job
> until it had accumulated enough to run it.  Instead what happens is that the
> scheduler picks an arbitrary list of nodes that may *or may not* have free
> slots, and sets those slots aside as they come free.  If slots come free
> that are *not* on this preselected list, they are cheerfully assigned to 
> other
> jobs, even those of lower priority than the reserving job.

Could it be that these other nodes were actually not suited for this 
large parallel job? Reason could be 'oneper' is not contained in 
"pe_list" for the corresponding queue instances, reason could be
resource requests with your job that can be satisfied only at a subset
of the queue instances, reason could be load thresholds with these queue
instances etc.

>
> The indirect evidence of this was right there in the monitor file
> ($SGE_ROOT/$SGE_CELL/common/schedule) when I had it turned on, but
> I looked right past it:  The reserving job has a list of queue instances 
> associated
> with it:
>
> 3568:1:RESERVING:1190724115:660:P:oneper:slots:20.000000
> 3568:1:RESERVING:1190724115:660:Q:all.q at cl023:slots:1.000000
> 3568:1:RESERVING:1190724115:660:Q:all.q at cl026:slots:1.000000
>
> ...and the list never changes!   I suspect now that what happened earlier was
> that a node *not* on the reserved list came free, and the job I thought was
> violating the reservation policy was scheduled there.  That's certainly what
> happened with some jobs that were scheduled last night.
>
> I suppose there ought to be a request-for-enhancement about this:  If
> the scheduler were smart enough to glom resources *as they became available*,
> rather than preselecting them (who knows how?), then reservation would
> probably be a more effective function.

Jobs' resource reservations are done anew with each scheduling 
interval. So actually the RFE is already implemented ;-)

Regards,
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list