[GE users] SGE memory management

Reuti reuti at Staff.Uni-Marburg.DE
Wed Sep 24 08:31:26 BST 2008


Am 24.09.2008 um 03:46 schrieb Ron Chen:

> --- On Wed, 9/24/08, Mag Gam <magawake at gmail.com> wrote:
>> Can you elaborate more on #4? How does the mechanism work?
>
> It's done by PDC - the Portable Data Collector. It can find all  
> processes from a job.
>
>> Is there a
>> grace period if the job goes over 2GB? or does it kill
>> immediately?

Please have also a look at `man queue_conf`section RESOURCE LIMITS,  
where it's explained what happened if you would set a value for  
s_vmem in addition to h_vmem to warn the job. Don't confuse this with  
soft limits from the kernel, where soft limits have the same effects  
as hard limits, but are just the custom ones set by the user.

-- Reuti


>
> I think when SGE finds out, it does not have a "grace period" for  
> the job. However, you can write a custom "terminate_method" to  
> define your own way to kill the job.
>
> http://gridengine.sunsource.net/nonav/source/browse/~checkout~/ 
> gridengine/doc/htmlman/htmlman5/queue_conf.html
>
> You can easily test a terminate_method with grace period, and with  
> the signal you want to send to the job with a few test jobs. As for  
> the test case, a simple C program that allocates memory in a loop  
> and prints whether the allocation was successful or not would be a  
> perfect one.
>
>  -Ron
>
>
>
>  Is
>> it a good practice to do this? Also, can you give me an
>> example of
>> when and why I would need to kill the user process (I know
>> I asked
>> this question before and I should know the answer, but I
>> want to be
>> certain I am on the right track and approaching the problem
>> with
>> sanity).
>>
>> TIA
>>
>>
>>
>>
>> On Tue, Sep 23, 2008 at 9:23 PM, Ron Chen
>> <ron_chen_123 at yahoo.com> wrote:
>>> --- On Wed, 9/24/08, Mag Gam
>> <magawake at gmail.com> wrote:
>>>> Cool. To sum up the kernel takes care of it. This
>> is what I
>>>> was looking for. Thanks again Ron!
>>>
>>> Actually it is a bit more complicated than that.
>>>
>>> There are 2 limits: process limit and job limit. And,
>> a Grid Engine job is a collection of processes started by
>> the job script. Also, most kernels do not have a concept of
>> a job, kernels deal with processes only.
>>>
>>> So, when SGE starts a job, the SGE execd sets the
>> process limit, and this limit is enfourced by the kernel.
>> Any process that wants to allocate more memory needs to ask
>> the kernel. And the kernel may or may not allow the process
>> to get more memory based on this limit.
>>>
>>> And for job limit, SGE finds out all the processes
>> that belong to a job. SGE then finds out how much memory
>> each one uses. When the sum of all the processes of a job
>> exceeds the limit, then SGE may kill the job.
>>>
>>> As an example, let say we have a job with limit of 2GB
>> of memory. There are the steps involved:
>>>
>>> 1) SGE starts the job by executing the job script with
>> a resource limit of 2 GB.
>>>
>>> 2) If any process exceeds 2GB of memory usage, the
>> kernel would not allow that process to get more memory.
>>>
>>> 3) However, most operating system kernels do not know
>> what a "job" is, so it is possible for a job to
>> start 2 processes, each get 1.5 GB of memory.
>>>
>>> 4) SGE has its own mechanism to find out which process
>> belongs to which job. So when it finds that there are 2
>> processes belonging to a job with 2GB of memory limit, but
>> the sum is greater than 2 GB (actually, 2 * 1.5 = 3GB), so
>> SGE kill the job if the 2GB limit is a "hard"
>> limit rather than a "soft" one.
>>>
>>>  -Ron
>>>
>>>
>>>
>>>>
>>>>
>>>> On Tue, Sep 23, 2008 at 8:51 PM, Ron Chen
>>>> <ron_chen_123 at yahoo.com> wrote:
>>>>> --- On Wed, 9/24/08, Mag Gam
>>>> <magawake at gmail.com> wrote:
>>>>>> Yah, that question says how to do it, but
>> I wanted
>>>> to learn
>>>>>> the internals on exactly how it does it
>> :-)
>>>>>
>>>>> Then it's not your fault, but rather
>> mine! 8-) I
>>>> throught your main goal was to find out how to do
>> it and
>>>> test it.
>>>>>
>>>>>
>>>>> If you want to know how SGE does it, Reuti
>> summarized
>>>> it in this email:
>>>>>
>>>>>
>>>>
>> http://gridengine.sunsource.net/servlets/ReadMsg? 
>> list=users&msgNo=25932
>>>>>
>>>>> And if you want to play with the SGE module
>> that finds
>>>> the resource usage of the job, and how it
>> enfources job
>>>> limit by summing up all the processes in a job,
>> you can find
>>>> some design docs in the SGE source directory.
>>>>>
>>>>>  -Ron
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> So it basically sets the ulimit of the
>> process and
>>>> user to
>>>>>> control the
>>>>>> memory, eh?
>>>>>>
>>>>>> Sorry for the redundant question!
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 23, 2008 at 8:34 PM, Ron Chen
>>>>>> <ron_chen_123 at yahoo.com> wrote:
>>>>>>> Wasn't it asked and answered
>> before?
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> http://gridengine.sunsource.net/servlets/ReadMsg? 
>> list=users&msgNo=25942
>>>>>>>
>>>>>>> And for the 2nd question: I believe
>> SGE sets
>>>> the
>>>>>> OS's process resource limit when you
>> submit a
>>>> job with
>>>>>> limit. You can find out the answer by
>> putting
>>>>>> "limit" or "ulimit"
>> in the
>>>> job.
>>>>>>>
>>>>>>>  -Ron
>>>>>>>
>>>>>>>
>>>>>>> --- On Wed, 9/24/08, Mag Gam
>>>>>> <magawake at gmail.com> wrote:
>>>>>>>> From: Mag Gam
>> <magawake at gmail.com>
>>>>>>>> Subject: [GE users] SGE memory
>> management
>>>>>>>> To:
>> users at gridengine.sunsource.net
>>>>>>>> Date: Wednesday, September 24,
>> 2008, 8:26
>>>> AM
>>>>>>>> I am curious how SGE allocates
>> memory to
>>>> a
>>>>>> process.
>>>>>>>>
>>>>>>>> Lets say you submit a job and
>> you specify
>>>> to only
>>>>>> use 16G
>>>>>>>> of memory.
>>>>>>>> All the exec hosts have 128GB of
>> memory.
>>>> Does SGE
>>>>>> cap the
>>>>>>>> process to
>>>>>>>> 16g? I though that was a
>> function of the
>>>> OS to
>>>>>> control how
>>>>>>>> memory is
>>>>>>>> managed and not SGE. Sorry if
>> this is a
>>>> newbie
>>>>>> question
>>>>>>>>
>>>>>>>> TIA
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>
>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail:
>>>>>>>>
>> users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>
>> users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail:
>>>>>> users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:
>>>>>>
>> users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail:
>>>>>> users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:
>>>> users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail:
>>>> users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
>>>> users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list