[GE users] RR w/o BF

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Wed Jun 28 10:28:00 BST 2006


Hi Brady,

I may be wrong, but I believe, what you and Sili are actually asking for is 
resource reservation without backfilling. That would generally prevent jobs 
be assigned to resources that are reserved for some job.

At that occasion let me direct your attention to the definitions for
RR, BF and AR in the "0. Introduction" section of

    http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/devel/rfe/resource_reservation.txt?content-type=text/plain

in many cases AR is mixed up with RR.

But anyways: If just disabling the backfilling would solve your problem
that should be doable with moderate coding. Things are happening in

    double utilization_max(const lListElem *cr, u_long32 start_time, u_long32 duration)
    {
       const lListElem *rde;
       lListElem *start, *prev;
       double max = 0.0;
       u_long32 end_time = utilization_endtime(start_time, duration);

       DENTER(TOP_LAYER, "utilization_max");

       /* someone is asking for the current utilization */
       if (start_time == DISPATCH_TIME_NOW) {
          DEXIT;
          return lGetDouble(cr, RUE_utilized_now);
       }

    #if 0
       utilization_print(cr, "the object");
    #endif

       utilization_find_time_or_prevstart_or_prev(lGetList(cr, RUE_utilized), start_time,
             &start, &prev);

       if (start) {
          max = lGetDouble(start, RDE_amount);
          rde = lNext(start);
       } else {
          if (prev) {
             max = lGetDouble(prev, RDE_amount);
             rde = lNext(prev);
          } else {
             rde = lFirst(lGetList(cr, RUE_utilized));
          }
       }

       /* now watch out for the maximum before end time */
       while (rde && end_time > lGetUlong(rde, RDE_time)) {
          max = MAX(max, lGetDouble(rde, RDE_amount));
          rde = lNext(rde);
       }

       DEXIT;
       return max;
    }

of libs/sched/sge_resource_utilization.c. When you look at that function,
you'll notice the search for the maximum utilization does always stop 
when

    end_time ( = start_time + duration)

is reached. If I'm not heavily mistaken, backfilling could be disabled, 
if the utilization maximum were determined until the end_time and beyond.

Regards,
Andreas

On Tue, 27 Jun 2006, Brady Catherman wrote:

> We don't use time limits either because some jobs we run take seconds and 
> others take months and there is often no way to tell the difference between 
> the two wen submitting.
>
> What would be really nice is something that would give me the ability to tell 
> GE to only look at x jobs from the top for scheduling. That way I could set 
> it to act as though there is only 1 job in the queue at a time. Once that job 
> is scheduled then it moves to the next job. This is about the only way to 
> make scheduling efficient in our environment. (Yes! I understand that this 
> will slow down the job starting process a lo but that isn't an issue for us)
>
> Any way to make this doable?
>
>
> On Jun 27, 2006, at 12:08 PM, Reuti wrote:
>
>> Hi,
>> 
>> Am 27.06.2006 um 20:25 schrieb Sili (wesley) Huang:
>> 
>>> Hi Andreas,
>>> 
>>> 
>>> 
>>> If I recall correctly, h_rt is a hard limit for wall clock time. If I 
>>> specify -h h_rt=0:10:0 to $SGE_ROOT/default/common/sge_request file, then 
>>> the jobs run more than 10 minutes of wall clock time will be killed. Of 
>>> course, I can ask users to specify h_rt when using qsub, but this is not 
>>> the way I want to go because there are many long jobs (days to weeks) in 
>>> our cluster and I do not want to add in a layer of complexity to our 
>>> users.
>>> 
>>> 
>>> 
>>> Is there any way to work around this problem without specifying the hard 
>>> limit of h_rt? E.g. any way I can configure the reservation so that I can 
>>> reserve all slots for a reservation (so that no serial job can fill the 
>>> released slots) but only use some of them when the job is dispatched?
>> can you have to queues: one only for parallel, one only for serial jobs. 
>> This way you could a) suspend serial jobs, or b) push them in the 
>> background by setting the priority (i.e. nice values) to 0 and +19.
>> 
>> Or if you like to dry out the serial queue tis is working again:
>> 
>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=464
>> 
>> -- Reuti
>> 
>>> Cheers.
>>> 
>>> 
>>> 
>>> Best regards,
>>> 
>>> Sili(wesley) Huang
>>> 
>>> 
>>> 
>>> Tuesday, June 27, 2006, 5:40:43 AM, you wrote:
>>> 
>>> 
>>> 
>>> Andreas> Most probably your suspicion is right. The parallel jobs' 
>>> reservation
>>> 
>>> Andreas> sure enough becomes valueless, if the sequential jobs do not 
>>> finish
>>> 
>>> Andreas> at the time foreseen by the scheduler and default_duration gets 
>>> not
>>> 
>>> Andreas> enforced by Grid Engine. Have you considered putting some
>>> 
>>> 
>>> 
>>> Andreas>     -l h_rt=:10:
>>> 
>>> 
>>> 
>>> Andreas> into cluster-wise sge_request(5) file?
>>> 
>>> 
>>> 
>>> Andreas> Regards,
>>> 
>>> Andreas> Andreas
>>> 
>>> 
>>> 
>>> Andreas> On Mon, 26 Jun 2006, Sili (wesley) Huang wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Hi Andreas,
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> I tried to observed on what were going on with this strange behavior. It 
>>>>> seems to me that the reservation is
>>> 
>>>>> tight with the specified run-length of a job. For example, in this 
>>>>> record of monitoring (385889 is a parallel
>>> 
>>>>> job with reservation enabled and having high priority, and 385865 is a 
>>>>> serial job):
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> [root common]#  cat schedule | egrep "385865|385889|::::::::"
>>> 
>>> 
>>> 
>>>>> ::::::::
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:P:mpich:slots:12.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:G:global:ncpus_agerber:12.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:H:v60-n28:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:H:v60-n65:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:H:v60-n75:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:H:v60-n66:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:H:v60-n31:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:H:v60-n62:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:H:v60-n15:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n28:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n65:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n75:slots:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n31:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n66:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n62:slots:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n15:slots:2.000000
>>> 
>>> 
>>> 
>>>>> ::::::::
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:P:mpich:slots:12.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:G:global:ncpus_agerber:12.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:H:v60-n28:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:H:v60-n65:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:H:v60-n75:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:H:v60-n62:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:H:v60-n73:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:H:v60-n52:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:H:v60-n66:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n28:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n65:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n75:slots:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n62:slots:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n73:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n52:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n66:slots:2.000000
>>> 
>>> 
>>> 
>>>>> ::::::::
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:P:mpich:slots:12.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:G:global:ncpus_agerber:12.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:H:v60-n28:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:H:v60-n65:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:H:v60-n75:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:H:v60-n62:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:H:v60-n73:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:H:v60-n52:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:H:v60-n66:singular:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n28:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n65:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n75:slots:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n62:slots:1.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n73:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n52:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n66:slots:2.000000
>>> 
>>> 
>>> 
>>>>> 385865:1:STARTING:1151341250:3660:H:v60-n47:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385865:1:STARTING:1151341250:3660:Q:all.q at v60-n47:slots:1.000000
>>> 
>>> 
>>> 
>>>>> ::::::::
>>> 
>>> 
>>> 
>>>>> 385865:1:RUNNING:1151341251:3660:H:v60-n47:singular:1.000000
>>> 
>>> 
>>> 
>>>>> 385865:1:RUNNING:1151341251:3660:Q:all.q at v60-n47:slots:1.000000
>>> 
>>> 
>>> 
>>>>> ::::::::
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> I suspect that SGE behavior is because:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> It seems to me that SGE is trying to reserve the processor resources 
>>>>> which are expected to be released soonest.
>>> 
>>>>> SGE determines which CPUs to be reserved by h_rt or s_rt or 
>>>>> default_duration by default. However, in our
>>> 
>>>>> cluster, we do not require users to specify h_rt or s_rt, so a 
>>>>> default_duration specified as one hour is used.
>>> 
>>>>> Therefore, if a serial job is finished very short, e.g. 10 minutes, SGE 
>>>>> doesn't reserve this CPU resource to the
>>> 
>>>>> reservation and hence the serial jobs still fill this CPU at the time it 
>>>>> is released. The same to the scenario
>>> 
>>>>> where a long job is occupying a CPU, e.g. 2 days, and SGE is always 
>>>>> expecting this CPU can be released soon and
>>> 
>>>>> reserves it to the reservation.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> My suspicions may be wrong. It would be great if someone having the same 
>>>>> problem can observe in their SGEs. If
>>> 
>>>>> my suspicions are correct, I think this is an odd implementation on 
>>>>> reservation since the reservation should not
>>> 
>>>>> only based on runtime specified.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Cheers.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Best regards,
>>> 
>>> 
>>> 
>>>>> Sili(wesley) Huang
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Monday, June 26, 2006, 5:41:25 AM, you wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Andreas> Have you observed reservation behaviour via the 'schedule' 
>>>>> file?
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Andreas> Andreas
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Andreas> On Fri, 23 Jun 2006, Brady Catherman wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >> Yes. If there is space they start fine. If they have reservation 
>>>>> enabled, and
>>> 
>>> 
>>> 
>>>>> >> they have a much higher priority than every other single process job 
>>>>> they
>>> 
>>> 
>>> 
>>>>> >> just sit at the top of the queue as if the reservation is not doing 
>>>>> anything
>>> 
>>> 
>>> 
>>>>> >> (max_reservations is currently set at 1000)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >> On Jun 23, 2006, at 2:07 PM, Reuti wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>> Am 23.06.2006 um 22:45 schrieb Brady Catherman:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>> I have done both of these and yet my clusters still hate parallel 
>>>>> jobs.
>>> 
>>> 
>>> 
>>>>> >>>> Does anybody have this working? everything I have seen is that 
>>>>> parallel
>>> 
>>> 
>>> 
>>>>> >>>> jobs are always shunned by grid engine. I would appreciate any 
>>>>> solutions
>>> 
>>> 
>>> 
>>>>> >>>> to this being passed my way! =) I have been working on this on and 
>>>>> off
>>> 
>>> 
>>> 
>>>>> >>>> since January.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>> But if the cluster is empty, they are starting? - Reuti
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>> On Jun 23, 2006, at 11:46 AM, Reuti wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>> Hi,
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>> you submitted with "-R y" and adjusted the scheduler to 
>>>>> "max_reservation
>>> 
>>> 
>>> 
>>>>> >>>>> 20" or an appropriate value?
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>> -- Reuti
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>> Am 23.06.2006 um 18:31 schrieb Sili (wesley) Huang:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> Hi Jean-Paul,
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> I have the similar problem as yours in our cluster. the 
>>>>> low-priority
>>> 
>>> 
>>> 
>>>>> >>>>>> serial jobs still get loaded into run state and the high-priority
>>> 
>>> 
>>> 
>>>>> >>>>>> parallel jobs are waiting. Did you figure out the solution 
>>>>> towards this
>>> 
>>> 
>>> 
>>>>> >>>>>> problem? Does the upgrade help?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> Cheers.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> Best regards,
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> Sili(wesley) Huang
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> --
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> mailto:shuang at unb.ca
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> Scientific Computing Support
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> Advanced Computational Research Laboratory
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> University of New Brunswick
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> Tel(office):  (506) 452-6348
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>>> 
>>>>> --------------------------------------------------------------------- To
>>> 
>>> 
>>> 
>>>>> >>>>>> unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net 
>>>>> For
>>> 
>>> 
>>> 
>>>>> >>>>>> additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>>> 
>>>>> ---------------------------------------------------------------------
>>> 
>>> 
>>> 
>>>>> >>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>>>> >>>>> For additional commands, e-mail: 
>>>>> users-help at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>>> 
>>>>> ---------------------------------------------------------------------
>>> 
>>> 
>>> 
>>>>> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>>>> >>>> For additional commands, e-mail: 
>>>>> users-help at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >>> 
>>>>> ---------------------------------------------------------------------
>>> 
>>> 
>>> 
>>>>> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>>>> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> >> ---------------------------------------------------------------------
>>> 
>>> 
>>> 
>>>>> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>>>> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> Andreas> 
>>>>> ---------------------------------------------------------------------
>>> 
>>> 
>>> 
>>>>> Andreas> To unsubscribe, e-mail: 
>>>>> users-unsubscribe at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>>>> Andreas> For additional commands, e-mail: 
>>>>> users-help at gridengine.sunsource.net
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>>> --
>>> 
>>> 
>>> 
>>>>> mailto:shuang at unb.ca
>>> 
>>> 
>>> 
>>>>> Scientific Computing Support
>>> 
>>> 
>>> 
>>>>> Advanced Computational Research Laboratory
>>> 
>>> 
>>> 
>>>>> University of New Brunswick
>>> 
>>> 
>>> 
>>>>> Tel(office):  (506) 452-6348
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> mailto:shuang at unb.ca
>>> 
>>> Scientific Computing Support
>>> 
>>> Advanced Computational Research Laboratory
>>> 
>>> University of New Brunswick
>>> 
>>> Tel(office):  (506) 452-6348
>>> 
>>> --------------------------------------------------------------------- To 
>>> unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For 
>>> additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list