[GE users] SGE memory management

Ron Chen ron_chen_123 at yahoo.com
Wed Sep 24 02:46:37 BST 2008


--- On Wed, 9/24/08, Mag Gam <magawake at gmail.com> wrote:
> Can you elaborate more on #4? How does the mechanism work? 

It's done by PDC - the Portable Data Collector. It can find all processes from a job.

> Is there a
> grace period if the job goes over 2GB? or does it kill
> immediately?

I think when SGE finds out, it does not have a "grace period" for the job. However, you can write a custom "terminate_method" to define your own way to kill the job.

http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/htmlman/htmlman5/queue_conf.html

You can easily test a terminate_method with grace period, and with the signal you want to send to the job with a few test jobs. As for the test case, a simple C program that allocates memory in a loop and prints whether the allocation was successful or not would be a perfect one.

 -Ron



 Is
> it a good practice to do this? Also, can you give me an
> example of
> when and why I would need to kill the user process (I know
> I asked
> this question before and I should know the answer, but I
> want to be
> certain I am on the right track and approaching the problem
> with
> sanity).
> 
> TIA
> 
> 
> 
> 
> On Tue, Sep 23, 2008 at 9:23 PM, Ron Chen
> <ron_chen_123 at yahoo.com> wrote:
> > --- On Wed, 9/24/08, Mag Gam
> <magawake at gmail.com> wrote:
> >> Cool. To sum up the kernel takes care of it. This
> is what I
> >> was looking for. Thanks again Ron!
> >
> > Actually it is a bit more complicated than that.
> >
> > There are 2 limits: process limit and job limit. And,
> a Grid Engine job is a collection of processes started by
> the job script. Also, most kernels do not have a concept of
> a job, kernels deal with processes only.
> >
> > So, when SGE starts a job, the SGE execd sets the
> process limit, and this limit is enfourced by the kernel.
> Any process that wants to allocate more memory needs to ask
> the kernel. And the kernel may or may not allow the process
> to get more memory based on this limit.
> >
> > And for job limit, SGE finds out all the processes
> that belong to a job. SGE then finds out how much memory
> each one uses. When the sum of all the processes of a job
> exceeds the limit, then SGE may kill the job.
> >
> > As an example, let say we have a job with limit of 2GB
> of memory. There are the steps involved:
> >
> > 1) SGE starts the job by executing the job script with
> a resource limit of 2 GB.
> >
> > 2) If any process exceeds 2GB of memory usage, the
> kernel would not allow that process to get more memory.
> >
> > 3) However, most operating system kernels do not know
> what a "job" is, so it is possible for a job to
> start 2 processes, each get 1.5 GB of memory.
> >
> > 4) SGE has its own mechanism to find out which process
> belongs to which job. So when it finds that there are 2
> processes belonging to a job with 2GB of memory limit, but
> the sum is greater than 2 GB (actually, 2 * 1.5 = 3GB), so
> SGE kill the job if the 2GB limit is a "hard"
> limit rather than a "soft" one.
> >
> >  -Ron
> >
> >
> >
> >>
> >>
> >> On Tue, Sep 23, 2008 at 8:51 PM, Ron Chen
> >> <ron_chen_123 at yahoo.com> wrote:
> >> > --- On Wed, 9/24/08, Mag Gam
> >> <magawake at gmail.com> wrote:
> >> >> Yah, that question says how to do it, but
> I wanted
> >> to learn
> >> >> the internals on exactly how it does it
> :-)
> >> >
> >> > Then it's not your fault, but rather
> mine! 8-) I
> >> throught your main goal was to find out how to do
> it and
> >> test it.
> >> >
> >> >
> >> > If you want to know how SGE does it, Reuti
> summarized
> >> it in this email:
> >> >
> >> >
> >>
> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=25932
> >> >
> >> > And if you want to play with the SGE module
> that finds
> >> the resource usage of the job, and how it
> enfources job
> >> limit by summing up all the processes in a job,
> you can find
> >> some design docs in the SGE source directory.
> >> >
> >> >  -Ron
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >>
> >> >> So it basically sets the ulimit of the
> process and
> >> user to
> >> >> control the
> >> >> memory, eh?
> >> >>
> >> >> Sorry for the redundant question!
> >> >>
> >> >>
> >> >> On Tue, Sep 23, 2008 at 8:34 PM, Ron Chen
> >> >> <ron_chen_123 at yahoo.com> wrote:
> >> >> > Wasn't it asked and answered
> before?
> >> >> >
> >> >> >
> >> >>
> >>
> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=25942
> >> >> >
> >> >> > And for the 2nd question: I believe
> SGE sets
> >> the
> >> >> OS's process resource limit when you
> submit a
> >> job with
> >> >> limit. You can find out the answer by
> putting
> >> >> "limit" or "ulimit"
> in the
> >> job.
> >> >> >
> >> >> >  -Ron
> >> >> >
> >> >> >
> >> >> > --- On Wed, 9/24/08, Mag Gam
> >> >> <magawake at gmail.com> wrote:
> >> >> >> From: Mag Gam
> <magawake at gmail.com>
> >> >> >> Subject: [GE users] SGE memory
> management
> >> >> >> To:
> users at gridengine.sunsource.net
> >> >> >> Date: Wednesday, September 24,
> 2008, 8:26
> >> AM
> >> >> >> I am curious how SGE allocates
> memory to
> >> a
> >> >> process.
> >> >> >>
> >> >> >> Lets say you submit a job and
> you specify
> >> to only
> >> >> use 16G
> >> >> >> of memory.
> >> >> >> All the exec hosts have 128GB of
> memory.
> >> Does SGE
> >> >> cap the
> >> >> >> process to
> >> >> >> 16g? I though that was a
> function of the
> >> OS to
> >> >> control how
> >> >> >> memory is
> >> >> >> managed and not SGE. Sorry if
> this is a
> >> newbie
> >> >> question
> >> >> >>
> >> >> >> TIA
> >> >> >>
> >> >> >>
> >> >>
> >>
> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail:
> >> >> >>
> >> users-unsubscribe at gridengine.sunsource.net
> >> >> >> For additional commands, e-mail:
> >> >> >>
> users-help at gridengine.sunsource.net
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >>
> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail:
> >> >>
> users-unsubscribe at gridengine.sunsource.net
> >> >> > For additional commands, e-mail:
> >> >> users-help at gridengine.sunsource.net
> >> >> >
> >> >> >
> >> >>
> >> >>
> >>
> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail:
> >> >>
> users-unsubscribe at gridengine.sunsource.net
> >> >> For additional commands, e-mail:
> >> >> users-help at gridengine.sunsource.net
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail:
> >> users-unsubscribe at gridengine.sunsource.net
> >> > For additional commands, e-mail:
> >> users-help at gridengine.sunsource.net
> >> >
> >> >
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail:
> >> users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail:
> >> users-help at gridengine.sunsource.net
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net


      

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list