[GE users] Small jobs jumping ahead of large jobs that reserve Virtual_free memory.

reuti reuti at staff.uni-marburg.de
Fri Feb 6 16:40:24 GMT 2009


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 06.02.2009 um 17:09 schrieb futurity:

> Hi Reuti,
>
> Sorry I forgot to mention that I'm using Grid Engine v61 .
>
> I've not heard of "h_rt".  I know that we don't use "h_rt" in the  
> arguments to qsub if that helps?

Neil,

h_rt mean wallclocktime, i.e. how long the job is granted to run. If  
you don't specify h_rt, no resource reservation can be done in a  
reliable way. For all jobs the defined default will be used:

$ qconf -ssconf
...
default_duration    0:10:00
...

You can try setting it to infinity.


> The resource we're using is virtual_free "vf".  The machines all  
> have an initial value of 3600M.  The small jobs require 256M and  
> the larger jobs the full 3600M.
>
> As a result, if a single small job is running on a machine, a large  
> job can't be transferred and run on it.  If a large job is running  
> on a machine, a small job can't be transferred and run on it.
>
> As a small and large job can't co-exist on a machine, I think this  
> means that back filling can't occur.

It can occur. If a node is reserved to be used in 20 minutes, SGE  
might put a job there which states that it will need only 10 minutes.  
Some explanations you can find here:

https://partneradvantage.sun.com/kiosk/ViewPDF?pdf_id=IG5EBGGX8K  
(Chapter 4)


> Our problem:
>
> If we have 10 machines, all currently running 1 small job on each  
> of them and we submit 10 large jobs with reservation "-R y",  
> followed by 10 small jobs without reservation "-R n", even though  
> the 10 small jobs were submitted later than the large jobs and the  
> large jobs were submitted first, we still see some of these small  
> jobs being transferred before the larger jobs.
>
> Is this because large and small jobs have the same priority. So  
> that instead of reserving machines for the large jobs, it just  
> allows the smaller jobs to also run on the machines already running  
> smaller jobs because they have equal priority as the larger ones?

I don't think so, instead it might be the backfilling as the assumed  
runtime of the jobs don't match the real ones.

-- Reuti


> Many thanks again for the help.
>
> Neil
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 06 February 2009 12:02
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Small jobs jumping ahead of large jobs that  
> reserve Virtual_free memory.
>
> Hi,
>
> Am 05.02.2009 um 19:07 schrieb futurity:
>
>> The admin manual talks about the jobs with the highest priority get
>> the earliest resource assignment.
>>
>> I have a few questions about this. When we submit jobs (all with the
>> same priority) using up a consumable virtual_free ?vf? memory
>> resource, we see in a queue with a small jobs running on every
>> machine, that small jobs that are requesting a small amount of ?vf?
>> but aren?t reserving ?vf? are transferring and running ahead of large
>> jobs that are requesting a large amount of ?vf? and are reserving
>> ?vf?.  These smaller jobs are submitted after the larger jobs are
>> submitted.
>
> what SGE version are you using? Maybe you see the "backfilling", as  
> no h_rt was requested and the default_duration is like in older SGE  
> versions still set to 10 minutes - although the small jobs will  
> need more than 10 minutes in the end. SGE might put them on nodes,  
> which are reservered in 12 minutes, but then the node is still  
> blocked by the smaller job.
>
> Do all your jobs request h_rt?
>
> -- Reuti
>
>>
>> 1a)   Does resource assignment and reservation work when jobs have
>> equal priority?
>> 1b)   In the example scenario described above, will the large jobs
>> always be held back as long as there are smaller jobs waiting to be
>> run (queued before it and after it)?
>> 1c)   We don't want the users submitting these larger to have an
>> unfair advantage in the grid, so we ideally don?t want them to be  
>> able
>> to submit jobs of a higher priority.  Do these larger jobs have to
>> have a higher priority?  If they have to have higher priority, is
>> there a way to stop the users abusing this priority advantage without
>> stopping them from using all machines when the grid is empty.
>>
>> I?ve tried to test various scenarios to replicate this problem, but
>> its a nightmare tracking what is going on when I?m submitting jobs to
>> more than one machine at a time.
>>
>> Any help will be really appreciated.
>>
>> Regards
>>
>> Neil
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=102382
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=102428
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=102438

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list