[GE users] Resource reservation fails for large job

s_kreidl sabine.kreidl at uibk.ac.at
Wed Jun 3 13:20:46 BST 2009


we are currently running SGE 6.2u1 on a homogeneous 200 core cluster with 8core nodes. At the moment a user is cluttering the cluster with 4/8-core openmp jobs and a 100 core job with guaranteed higher priority and submitted with -R y is starving.

I have excerpted a passage from $SGE_ROOT/$SGE_CELL/common/schedule (MONITOR=1), which clearly shows, where things go wrong, see attachment. 
Only the last six lines are relevant. Though a reservation of 8 slots is there for the queue instance par.q at lcc09-be1t12, the small job is nevertheless started at this queue instance within the same scheduling interval. 

I have checked the mailing list as much as I could. I can exclude issue #2896, as the relevant queue has the same runtime limits as the global configuration, namely 10/14 days. I can also exclude issue #2344, as all jobs are running in the same queue "par.q", which is not subordinate to any other.

I have set "Maximum Reservation" to 200 just in case (though I don't really know, what it means: Number of jobs to do reservations for, number of "RESERVING" lines in the schedule file per scheduler run,...?), and I have double checked, that the small jobs do at no time have a higher priority than the large job.

What additionally strikes me, is that "qstat -g c" at no time shows any reserved slots.

What am I doing wrong, respectively, can anyone give me a clear description of how to reliably implement resource reservation?!?

I'd be really grateful for any advice.

Thanks in advance,


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "sge_reservation_failure"  Application/OCTET-STREAM (Name: ]
    [ "sge_reservation_failure") 2.9 KB. ]
    [ Unable to print this part. ]

More information about the gridengine-users mailing list