[GE users] Resource Reservation Issue
matthew.bradford at eds.com
Thu Sep 11 12:21:39 BST 2008
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
We are running SGE 6.0u8 and have a problem with how resource reservation works.
We have a full cluster running mainly parallel jobs of various sizes from 1 node to 16 nodes.
We allow only 2 jobs to be run with a Reserve flag on them.
Users aren't specifying an h_rt, and the default runtime is set to 480 hours.
Occasionally, we set important jobs with a reserve flag to prevent resource starvation, and we push the job to the top of the pending queue.
What we appear to get when we switch on monitoring is the scheduler selecting the nodes for the reserved job, but then not amending that selection even when there are free nodes in the system.
During testing of this we noticed no difference when we explicitly set the h_rt to 480 hours for all jobs, and then reduce the h_rt for specific jobs. We thought that the scheduler would recalculate and select nodes where it knows that jobs are nearly finished.
Any help would be much appreciated.
More information about the gridengine-users