[GE users] Small jobs jumping ahead of large jobs that reserve Virtual_free memory.

futurity neil at futurity.co.uk
Fri Feb 6 16:09:44 GMT 2009


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Reuti,

Sorry I forgot to mention that I'm using Grid Engine v61 .

I've not heard of "h_rt".  I know that we don't use "h_rt" in the arguments to qsub if that helps?

The resource we're using is virtual_free "vf".  The machines all have an initial value of 3600M.  The small jobs require 256M and the larger jobs the full 3600M.

As a result, if a single small job is running on a machine, a large job can't be transferred and run on it.  If a large job is running on a machine, a small job can't be transferred and run on it.

As a small and large job can't co-exist on a machine, I think this means that back filling can't occur.

Our problem:

If we have 10 machines, all currently running 1 small job on each of them and we submit 10 large jobs with reservation "-R y", followed by 10 small jobs without reservation "-R n", even though the 10 small jobs were submitted later than the large jobs and the large jobs were submitted first, we still see some of these small jobs being transferred before the larger jobs.

Is this because large and small jobs have the same priority. So that instead of reserving machines for the large jobs, it just allows the smaller jobs to also run on the machines already running smaller jobs because they have equal priority as the larger ones?

Many thanks again for the help.

Neil

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: 06 February 2009 12:02
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Small jobs jumping ahead of large jobs that reserve Virtual_free memory.

Hi,

Am 05.02.2009 um 19:07 schrieb futurity:

> The admin manual talks about the jobs with the highest priority get 
> the earliest resource assignment.
>
> I have a few questions about this. When we submit jobs (all with the 
> same priority) using up a consumable virtual_free ?vf? memory 
> resource, we see in a queue with a small jobs running on every 
> machine, that small jobs that are requesting a small amount of ?vf?
> but aren?t reserving ?vf? are transferring and running ahead of large 
> jobs that are requesting a large amount of ?vf? and are reserving 
> ?vf?.  These smaller jobs are submitted after the larger jobs are 
> submitted.

what SGE version are you using? Maybe you see the "backfilling", as no h_rt was requested and the default_duration is like in older SGE versions still set to 10 minutes - although the small jobs will need more than 10 minutes in the end. SGE might put them on nodes, which are reservered in 12 minutes, but then the node is still blocked by the smaller job.

Do all your jobs request h_rt?

-- Reuti

>
> 1a)   Does resource assignment and reservation work when jobs have  
> equal priority?
> 1b)   In the example scenario described above, will the large jobs  
> always be held back as long as there are smaller jobs waiting to be 
> run (queued before it and after it)?
> 1c)   We don't want the users submitting these larger to have an  
> unfair advantage in the grid, so we ideally don?t want them to be able 
> to submit jobs of a higher priority.  Do these larger jobs have to 
> have a higher priority?  If they have to have higher priority, is 
> there a way to stop the users abusing this priority advantage without 
> stopping them from using all machines when the grid is empty.
>
> I?ve tried to test various scenarios to replicate this problem, but 
> its a nightmare tracking what is going on when I?m submitting jobs to 
> more than one machine at a time.
>
> Any help will be really appreciated.
>
> Regards
>
> Neil

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=102382

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=102428

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list