[GE users] SGE resources and job queues.

Jon Savian worknit at gmail.com
Wed May 11 22:36:04 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Luckily i won't need to disrupt already running jobs, just ones that
are waiting to run.

Thanks.

On 5/11/05, Chris Dagdigian <dag at sonsorol.org> wrote:
> 
> Grid Engine 6.x has the concept of "hostgroups" which may be easier
> to set up if you want to group your compute resources by rack
> location. Otherwise you are dead on with the resource idea -- you can
> attach arbitrary resources to nodes that your users can make hard
> requests on.
> 
> The big issue for you is where you mention "...means moving the other
> jobs that users submitted to other nodes...."
> 
> This is not easy to make happen. By default Grid Engine will never
> mess with a running job --
> the way Grid Engine makes policy based resource allocation happen is
> by manipulating the order of items waiting in the pending list.  It
> will not screw around with running jobs that have already been
> dispatched to nodes. { unless you explicitly configure it to do so ... }
> 
> So by default there is nothing in SGE that will "move jobs to
> different nodes" -- you'll have to make that happen yourself and it
> tends to be application specific in how this actually happens
> cleanly.  There are clear mechanisms for doing this (job migration /
> checkpoint / restart) but this is not something that is implicit,
> easy or automatic.
> 
> If you have the source code to these applications and you can
> implement checkpoint/restart features then you may be able to easily
> use the SGE migration features to bounce jobs from node to node. This
> would certainly give you the freedom you need but relatively few
> people are in a position where 100% of their cluster jobs are
> checkpoint-able and subject to seamless migration.
> 
> So you may be in for some difficulties when you are in a situation
> where there are running jobs already dispatched to the "big"
> resources (such as a rack of nodes) but you do have some
> opportunities for making these sorts of things happen with jobs that
> are still waiting for dispatch.
> 
> I'll mention some possibilities below that could be worth
> investigating but they fall well outside the realm of "what I've
> actually implemented myself" so take them with a grain of salt!
> 
> (1) you may be able to use the Grid Engine resource reservation and
> backfill mechanisms as a way to reserve entire racks for a set of
> jobs. This approach works best in areas where users are able to
> accurately predict the runtime their jobs need so that the backfill
> works efficiently.  The concept of resource reservation was invented
> (I think) to cover exactly these sorts of situations you are describing.
> 
> (2) Another option may be to investigate the urgency sub policy --
> there is a way to attach urgency values to resources such as "Rack_A"
> such that jobs requesting the resource end up getting a higher
> entitlement share which means that the pending list would be
> reorganized to boost the job higher in the list which means they
> would get first crack at Rack_A job slots as running jobs drained out.
> 
> Also you may want to read the official SGE 6.0x documenation
> available at this URL:
> http://docs.sun.com/app/docs/coll/1017.3?q=N1GE
> 
> The various resource allocation policies are covered in far greater
> detail than the resource.html doc you referenced.
> 
> Regards,
> Chris
> 
> 
> On May 11, 2005, at 4:57 PM, Jon Savian wrote:
> 
> > Hi Reuti,
> >
> > Thanks for your prompt response.  Users usually run scientific
> > programs and request whatever resources they need for the job.  So
> > yes, they specify runtime, memory, and number of slots needed.
> >
> > Users have expressed interest in running larger jobs that require 32
> > nodes, containing 2 slots, and 2GB of memroy each.  However they would
> > like jobs to be run on nodes contained in the same rack, instead of
> > using nodes accross multiple racks.  We have multiple racks of 32
> > nodes.  Hard requests will be needed i belive.
> >
> > So the first step i took was to specify a resource for one of the 32
> > node racks.  So when a user does a "qsub -l resource_name....." It
> > will run under the 32 nodes specified by it.  However other users
> > might have already submitted jobs that are queued to run on some of
> > the nodes we will need for our larger 32 node single rack job.  So
> > ideally, i think we would want to find a way to make the the the
> > single rack available so that the larger 32 node single rack job can
> > run ASAP, which means moving the other jobs that users submitted to
> > other nodes.  This may happen on a usual basis, so any kind of
> > permanent setting for this would be great.
> >
> > I should also mention that I am making all modifications via qmon.
> >
> > Thanks.
> >
> > Jon
> >
> >
> > They will be running a job on 32 nodes, each having 2GB memory, 2
> > slots/node.
> >
> > On 5/11/05, Reuti <reuti at staff.uni-marburg.de> wrote:
> >
> >> Hi Jon,
> >>
> >> can you give more details: what exactly do you mean with small and
> >> large jobs?
> >> The runtime, the memory request, the number of slots?
> >>
> >> And: is resource2 a hard request for the small jobs?
> >>
> >> Anyway: Two possibilities to look at are soft-requests (for
> >> resource1 for the
> >> small jobs), or putting a sequence number on the nodes, so that
> >> resource1 nodes
> >> are filled first.
> >>
> >> Cheers - Reuti
> >>
> >>
> >> Quoting Jon Savian <worknit at gmail.com>:
> >>
> >>
> >>> Hi Everyone,
> >>>
> >>> I am trying to allocate resources on a cluster, so i followed the
> >>> steps here:
> >>> http://gridengine.sunsource.net/project/gridengine/howto/
> >>> resource.html.
> >>>  Lets say i created two resources, we'll call them resource1 and
> >>>  resource2.  I want to be able to run large job using resource2,
> >>> but if
> >>> there are a lot of smaller jobs queued to run on resource2 then the
> >>> larger job will have to wait until the smaller ones execute.  Is
> >>> there
> >>> any way to move smaller jobs from the nodes on resource2 and put
> >>> them
> >>> on resource1 (or any other non-resource2 nodes for that matter) so
> >>> that the larger job may run on resource2 ASAP?  Or even better, are
> >>> there any priorities that can be set with the larger job that
> >>> will put
> >>> it before the smaller ones?
> >>>
> >>> Thanks.
> >>>
> >>> Jon
> >>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list