[GE users] Resource reservation and parallel jobs

dougalb dougal.lists at gmail.com
Tue Jun 2 09:20:23 BST 2009


Joesph,

- how many queues do you have
  We are trying to achieve a solution using a single queue.

- do you have any usage policy in place
  We have setup fairshare

- do you have any subordinate queues
  No

- do your parallel jobs need exclusive nodes
  This would be nice but not possible until 6.2u3 goes production

- do you have jobs stacking enabled for serial jobs
  Yes, queue_sort_method=load   and   load_formula=slots

- did you map nodes to different queues or node to queue mapping overlaps?
  No, all nodes are assigned to a single queue with no. of cores = no. of slots

Kind regards,

Dougal

On Sat, May 30, 2009 at 7:50 PM, hargitai <joseph.hargitai at nyu.edu> wrote:
> Since many of us are struggling with similar issues, perhaps would help to break the problem down a bit:
>
> - how many queues do you have
> - do you have any usage policy in place
> - do you have any subordinate queues
> - do your parallel jobs need exclusive nodes
> - do you have jobs stacking enabled for serial jobs
> - did you map nodes to different queues or node to queue mapping overlaps?
>
> best
> joseph
>
> ----- Original Message -----
> From: dougalb <dougal.lists at gmail.com>
> Date: Saturday, May 30, 2009 11:16 am
> Subject: Re: [GE users] Resource reservation and parallel jobs
>
>> Anybody got any thoughts?
>>
>> On Tue, May 26, 2009 at 10:13 PM, dougalb <dougal.lists at gmail.com> wrote:
>> > Hi all,
>> >
>> >
>> >
>> > I have a question about "resource reservation" in SGE. We are using
>> 6.2u2_1 with a single queue.
>> >
>> > I am currently setting up a new 128 node(1024 core) cluster for an
>> R&D environment. There is a large mix on jobs between batch and
>> parallel. One of the users submits large amounts ~20,000 30 min batch
>> jobs which fills the cluster. This is obviously causing parallel job starvation.
>> >
>> > To try and resolve this I have enabled resource reservations with a
>> setting of 20. This does not appear to be helping with the parallel
>> jobs. First thing I have noticed is that not enough slots where
>> becoming free per schedule interval, so I have changed this from 15
>> seconds to 45 seconds. This does seem to help but does not really
>> solve the issue and has added more latency to the scheduling.
>> >
>> > Is there a better approach to this problem?
>> >
>> > Kind regards,
>> >
>> > Dougal
>> >
>> > ------------------------------------------------------
>> > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199042
>> >
>> > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>> >
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199876
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199887
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200316

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list