[GE users] Resource reservation fails for large job
sabine.kreidl at uibk.ac.at
Wed Jun 3 13:20:46 BST 2009
we are currently running SGE 6.2u1 on a homogeneous 200 core cluster with 8core nodes. At the moment a user is cluttering the cluster with 4/8-core openmp jobs and a 100 core job with guaranteed higher priority and submitted with -R y is starving.
I have excerpted a passage from $SGE_ROOT/$SGE_CELL/common/schedule (MONITOR=1), which clearly shows, where things go wrong, see attachment.
Only the last six lines are relevant. Though a reservation of 8 slots is there for the queue instance par.q at lcc09-be1t12, the small job is nevertheless started at this queue instance within the same scheduling interval.
I have checked the mailing list as much as I could. I can exclude issue #2896, as the relevant queue has the same runtime limits as the global configuration, namely 10/14 days. I can also exclude issue #2344, as all jobs are running in the same queue "par.q", which is not subordinate to any other.
I have set "Maximum Reservation" to 200 just in case (though I don't really know, what it means: Number of jobs to do reservations for, number of "RESERVING" lines in the schedule file per scheduler run,...?), and I have double checked, that the small jobs do at no time have a higher priority than the large job.
What additionally strikes me, is that "qstat -g c" at no time shows any reserved slots.
What am I doing wrong, respectively, can anyone give me a clear description of how to reliably implement resource reservation?!?
I'd be really grateful for any advice.
Thanks in advance,
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
[ Part 2, "sge_reservation_failure" Application/OCTET-STREAM (Name: ]
[ "sge_reservation_failure") 2.9 KB. ]
[ Unable to print this part. ]
More information about the gridengine-users