[GE users] Advance reservation strange behavior

Jean-Paul Minet minet at cism.ucl.ac.be
Tue Jun 27 10:01:53 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

We probably have a similar setup as Sili: the scheduler default duration is set 
at 4 days, the h_rt for our unique all.q queue is set at 10 days, and we don't 
require users to specify a (hard or soft) run time limit.

I would have expected that, in case of a pending job having reserved resources, 
once a CPU gets free, the scheduler checks if that CPU is already in its 
"reserved" list.  If not, it could swap one of its reserved (but still busy cpu) 
against this newly freed cpu. Or maybe I misunderstaood a point...

Jean-paul

Andreas.Haas at Sun.COM wrote:
> Most probably your suspicion is right. The parallel jobs' reservation 
> sure enough becomes valueless, if the sequential jobs do not finish
> at the time foreseen by the scheduler and default_duration gets not 
> enforced by Grid Engine. Have you considered putting some
> 
>    -l h_rt=:10:
> 
> into cluster-wise sge_request(5) file?
> 
> Regards,
> Andreas
> 
> On Mon, 26 Jun 2006, Sili (wesley) Huang wrote:
> 
>>
>> Hi Andreas,
>>
>>
>> I tried to observed on what were going on with this strange behavior. 
>> It seems to me that the reservation is
>> tight with the specified run-length of a job. For example, in this 
>> record of monitoring (385889 is a parallel
>> job with reservation enabled and having high priority, and 385865 is a 
>> serial job): 
>>
>>
>> [root common]#  cat schedule | egrep "385865|385889|::::::::"
>>
>> ::::::::
>>
>> 385889:1:RESERVING:1151341235:3660:P:mpich:slots:12.000000
>>
>> 385889:1:RESERVING:1151341235:3660:G:global:ncpus_agerber:12.000000
>>
>> 385889:1:RESERVING:1151341235:3660:H:v60-n28:singular:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:H:v60-n65:singular:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:H:v60-n75:singular:1.000000
>>
>> 385889:1:RESERVING:1151341235:3660:H:v60-n66:singular:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:H:v60-n31:singular:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:H:v60-n62:singular:1.000000
>>
>> 385889:1:RESERVING:1151341235:3660:H:v60-n15:singular:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n28:slots:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n65:slots:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n75:slots:1.000000
>>
>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n31:slots:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n66:slots:2.000000
>>
>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n62:slots:1.000000
>>
>> 385889:1:RESERVING:1151341235:3660:Q:all.q at v60-n15:slots:2.000000
>>
>> ::::::::
>>
>> 385889:1:RESERVING:1151341250:3660:P:mpich:slots:12.000000
>>
>> 385889:1:RESERVING:1151341250:3660:G:global:ncpus_agerber:12.000000
>>
>> 385889:1:RESERVING:1151341250:3660:H:v60-n28:singular:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:H:v60-n65:singular:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:H:v60-n75:singular:1.000000
>>
>> 385889:1:RESERVING:1151341250:3660:H:v60-n62:singular:1.000000
>>
>> 385889:1:RESERVING:1151341250:3660:H:v60-n73:singular:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:H:v60-n52:singular:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:H:v60-n66:singular:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n28:slots:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n65:slots:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n75:slots:1.000000
>>
>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n62:slots:1.000000
>>
>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n73:slots:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n52:slots:2.000000
>>
>> 385889:1:RESERVING:1151341250:3660:Q:all.q at v60-n66:slots:2.000000
>>
>> ::::::::
>>
>> 385889:1:RESERVING:1151341265:3660:P:mpich:slots:12.000000
>>
>> 385889:1:RESERVING:1151341265:3660:G:global:ncpus_agerber:12.000000
>>
>> 385889:1:RESERVING:1151341265:3660:H:v60-n28:singular:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:H:v60-n65:singular:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:H:v60-n75:singular:1.000000
>>
>> 385889:1:RESERVING:1151341265:3660:H:v60-n62:singular:1.000000
>>
>> 385889:1:RESERVING:1151341265:3660:H:v60-n73:singular:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:H:v60-n52:singular:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:H:v60-n66:singular:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n28:slots:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n65:slots:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n75:slots:1.000000
>>
>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n62:slots:1.000000
>>
>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n73:slots:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n52:slots:2.000000
>>
>> 385889:1:RESERVING:1151341265:3660:Q:all.q at v60-n66:slots:2.000000
>>
>> 385865:1:STARTING:1151341250:3660:H:v60-n47:singular:1.000000
>>
>> 385865:1:STARTING:1151341250:3660:Q:all.q at v60-n47:slots:1.000000
>>
>> ::::::::
>>
>> 385865:1:RUNNING:1151341251:3660:H:v60-n47:singular:1.000000
>>
>> 385865:1:RUNNING:1151341251:3660:Q:all.q at v60-n47:slots:1.000000
>>
>> ::::::::
>>
>>
>> I suspect that SGE behavior is because: 
>>
>>
>> It seems to me that SGE is trying to reserve the processor resources 
>> which are expected to be released soonest.
>> SGE determines which CPUs to be reserved by h_rt or s_rt or 
>> default_duration by default. However, in our
>> cluster, we do not require users to specify h_rt or s_rt, so a 
>> default_duration specified as one hour is used.
>> Therefore, if a serial job is finished very short, e.g. 10 minutes, 
>> SGE doesn't reserve this CPU resource to the
>> reservation and hence the serial jobs still fill this CPU at the time 
>> it is released. The same to the scenario
>> where a long job is occupying a CPU, e.g. 2 days, and SGE is always 
>> expecting this CPU can be released soon and
>> reserves it to the reservation. 
>>
>>
>> My suspicions may be wrong. It would be great if someone having the 
>> same problem can observe in their SGEs. If
>> my suspicions are correct, I think this is an odd implementation on 
>> reservation since the reservation should not
>> only based on runtime specified. 
>>
>>
>> Cheers. 
>>
>>
>> Best regards,
>>
>> Sili(wesley) Huang
>>
>>
>> Monday, June 26, 2006, 5:41:25 AM, you wrote:
>>
>>
>> Andreas> Have you observed reservation behaviour via the 'schedule' file?
>>
>>
>> Andreas> Andreas
>>
>>
>> Andreas> On Fri, 23 Jun 2006, Brady Catherman wrote:
>>
>>
>> >> Yes. If there is space they start fine. If they have reservation 
>> enabled, and 
>>
>> >> they have a much higher priority than every other single process 
>> job they 
>>
>> >> just sit at the top of the queue as if the reservation is not doing 
>> anything 
>>
>> >> (max_reservations is currently set at 1000)
>>
>>
>>
>> >> On Jun 23, 2006, at 2:07 PM, Reuti wrote:
>>
>>
>> >>> Am 23.06.2006 um 22:45 schrieb Brady Catherman:
>>
>>
>> >>>> I have done both of these and yet my clusters still hate parallel 
>> jobs. 
>>
>> >>>> Does anybody have this working? everything I have seen is that 
>> parallel 
>>
>> >>>> jobs are always shunned by grid engine. I would appreciate any 
>> solutions 
>>
>> >>>> to this being passed my way! =) I have been working on this on 
>> and off 
>>
>> >>>> since January.
>>
>>
>> >>> But if the cluster is empty, they are starting? - Reuti
>>
>>
>>
>>
>> >>>> On Jun 23, 2006, at 11:46 AM, Reuti wrote:
>>
>>
>> >>>>> Hi,
>>
>>
>> >>>>> you submitted with "-R y" and adjusted the scheduler to 
>> "max_reservation 
>>
>> >>>>> 20" or an appropriate value?
>>
>>
>> >>>>> -- Reuti
>>
>>
>>
>> >>>>> Am 23.06.2006 um 18:31 schrieb Sili (wesley) Huang:
>>
>>
>> >>>>>> Hi Jean-Paul,
>>
>>
>>
>>
>> >>>>>> I have the similar problem as yours in our cluster. the 
>> low-priority 
>>
>> >>>>>> serial jobs still get loaded into run state and the high-priority 
>>
>> >>>>>> parallel jobs are waiting. Did you figure out the solution 
>> towards this 
>>
>> >>>>>> problem? Does the upgrade help?
>>
>>
>>
>>
>> >>>>>> Cheers.
>>
>>
>>
>>
>> >>>>>> Best regards,
>>
>>
>> >>>>>> Sili(wesley) Huang
>>
>>
>>
>>
>> >>>>>> --
>>
>>
>> >>>>>> mailto:shuang at unb.ca
>>
>>
>> >>>>>> Scientific Computing Support
>>
>>
>> >>>>>> Advanced Computational Research Laboratory
>>
>>
>> >>>>>> University of New Brunswick
>>
>>
>> >>>>>> Tel(office):  (506) 452-6348
>>
>>
>> >>>>>> 
>> --------------------------------------------------------------------- To 
>>
>> >>>>>> unsubscribe, 
>> e-mail: users-unsubscribe at gridengine.sunsource.net For 
>>
>> >>>>>> additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> >>>>> 
>> ---------------------------------------------------------------------
>>
>> >>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>
>> >>>>> For additional commands, 
>> e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> >>>> 
>> ---------------------------------------------------------------------
>>
>> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>
>> >>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> >>> ---------------------------------------------------------------------
>>
>> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>
>> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> >> ---------------------------------------------------------------------
>>
>> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>
>> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> Andreas> 
>> ---------------------------------------------------------------------
>>
>> Andreas> To unsubscribe, 
>> e-mail: users-unsubscribe at gridengine.sunsource.net
>>
>> Andreas> For additional commands, 
>> e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> -- 
>>
>> mailto:shuang at unb.ca
>>
>> Scientific Computing Support
>>
>> Advanced Computational Research Laboratory
>>
>> University of New Brunswick
>>
>> Tel(office):  (506) 452-6348
>>
>>
>>
> 
> 
> ------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

-- 
Jean-Paul Minet
Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
Université Catholique de Louvain
Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list