[GE users] SGE memory management

Mag Gam magawake at gmail.com
Wed Sep 24 02:28:27 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Yes. This is the kind of explanation I am looking for. Even though I
am a newbie please don't shy away from being technical. Please use
your full ability :-)

Can you elaborate more on #4? How does the mechanism work?  Is there a
grace period if the job goes over 2GB? or does it kill immediately? Is
it a good practice to do this? Also, can you give me an example of
when and why I would need to kill the user process (I know I asked
this question before and I should know the answer, but I want to be
certain I am on the right track and approaching the problem with
sanity).

TIA




On Tue, Sep 23, 2008 at 9:23 PM, Ron Chen <ron_chen_123 at yahoo.com> wrote:
> --- On Wed, 9/24/08, Mag Gam <magawake at gmail.com> wrote:
>> Cool. To sum up the kernel takes care of it. This is what I
>> was looking for. Thanks again Ron!
>
> Actually it is a bit more complicated than that.
>
> There are 2 limits: process limit and job limit. And, a Grid Engine job is a collection of processes started by the job script. Also, most kernels do not have a concept of a job, kernels deal with processes only.
>
> So, when SGE starts a job, the SGE execd sets the process limit, and this limit is enfourced by the kernel. Any process that wants to allocate more memory needs to ask the kernel. And the kernel may or may not allow the process to get more memory based on this limit.
>
> And for job limit, SGE finds out all the processes that belong to a job. SGE then finds out how much memory each one uses. When the sum of all the processes of a job exceeds the limit, then SGE may kill the job.
>
> As an example, let say we have a job with limit of 2GB of memory. There are the steps involved:
>
> 1) SGE starts the job by executing the job script with a resource limit of 2 GB.
>
> 2) If any process exceeds 2GB of memory usage, the kernel would not allow that process to get more memory.
>
> 3) However, most operating system kernels do not know what a "job" is, so it is possible for a job to start 2 processes, each get 1.5 GB of memory.
>
> 4) SGE has its own mechanism to find out which process belongs to which job. So when it finds that there are 2 processes belonging to a job with 2GB of memory limit, but the sum is greater than 2 GB (actually, 2 * 1.5 = 3GB), so SGE kill the job if the 2GB limit is a "hard" limit rather than a "soft" one.
>
>  -Ron
>
>
>
>>
>>
>> On Tue, Sep 23, 2008 at 8:51 PM, Ron Chen
>> <ron_chen_123 at yahoo.com> wrote:
>> > --- On Wed, 9/24/08, Mag Gam
>> <magawake at gmail.com> wrote:
>> >> Yah, that question says how to do it, but I wanted
>> to learn
>> >> the internals on exactly how it does it :-)
>> >
>> > Then it's not your fault, but rather mine! 8-) I
>> throught your main goal was to find out how to do it and
>> test it.
>> >
>> >
>> > If you want to know how SGE does it, Reuti summarized
>> it in this email:
>> >
>> >
>> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=25932
>> >
>> > And if you want to play with the SGE module that finds
>> the resource usage of the job, and how it enfources job
>> limit by summing up all the processes in a job, you can find
>> some design docs in the SGE source directory.
>> >
>> >  -Ron
>> >
>> >
>> >
>> >
>> >
>> >>
>> >> So it basically sets the ulimit of the process and
>> user to
>> >> control the
>> >> memory, eh?
>> >>
>> >> Sorry for the redundant question!
>> >>
>> >>
>> >> On Tue, Sep 23, 2008 at 8:34 PM, Ron Chen
>> >> <ron_chen_123 at yahoo.com> wrote:
>> >> > Wasn't it asked and answered before?
>> >> >
>> >> >
>> >>
>> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=25942
>> >> >
>> >> > And for the 2nd question: I believe SGE sets
>> the
>> >> OS's process resource limit when you submit a
>> job with
>> >> limit. You can find out the answer by putting
>> >> "limit" or "ulimit" in the
>> job.
>> >> >
>> >> >  -Ron
>> >> >
>> >> >
>> >> > --- On Wed, 9/24/08, Mag Gam
>> >> <magawake at gmail.com> wrote:
>> >> >> From: Mag Gam <magawake at gmail.com>
>> >> >> Subject: [GE users] SGE memory management
>> >> >> To: users at gridengine.sunsource.net
>> >> >> Date: Wednesday, September 24, 2008, 8:26
>> AM
>> >> >> I am curious how SGE allocates memory to
>> a
>> >> process.
>> >> >>
>> >> >> Lets say you submit a job and you specify
>> to only
>> >> use 16G
>> >> >> of memory.
>> >> >> All the exec hosts have 128GB of memory.
>> Does SGE
>> >> cap the
>> >> >> process to
>> >> >> 16g? I though that was a function of the
>> OS to
>> >> control how
>> >> >> memory is
>> >> >> managed and not SGE. Sorry if this is a
>> newbie
>> >> question
>> >> >>
>> >> >> TIA
>> >> >>
>> >> >>
>> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail:
>> >> >>
>> users-unsubscribe at gridengine.sunsource.net
>> >> >> For additional commands, e-mail:
>> >> >> users-help at gridengine.sunsource.net
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail:
>> >> users-unsubscribe at gridengine.sunsource.net
>> >> > For additional commands, e-mail:
>> >> users-help at gridengine.sunsource.net
>> >> >
>> >> >
>> >>
>> >>
>> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail:
>> >> users-unsubscribe at gridengine.sunsource.net
>> >> For additional commands, e-mail:
>> >> users-help at gridengine.sunsource.net
>> >
>> >
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>> > For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list