[GE users] comprehensive -l limit documentation

Reuti reuti at staff.uni-marburg.de
Thu Jan 24 16:53:49 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 24.01.2008 um 16:47 schrieb Alexandre Racine:

> Hi, doing more tests...
>
>
> Weirdly, putting "NONE" in qconf -mc like this...
> h_vmem              h_vmem     MEMORY      <=    YES          
> YES        NONE       0
> makes some program abort with a kill signal.
>
> In the notification, I received :"failed assumedly after job  
> because: job 431.7 died through signal KILL (9)"
>
> Is this normal? Running the program on the command line work, of  
> course. Or running it on machines that does not have the complex  
> value, works too.

it could be, that in this case the limit is simply set to zero. Any  
further information in the messages file of the node?

-- Reuti


> Thanks.
>
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Thu 2008-01-24 06:03
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] comprehensive -l limit documentation
>
> Hi,
>
> Am 23.01.2008 um 22:20 schrieb Alexandre Racine:
>
>> << --- Q2 Does this config up here is optimal for my need? I don't
>> << want any limit by default, but obliviously I want to be able to
>> << request some reserved memory for some task.
>>
>> <To me it looks okay. But you will need a default limit - otherwise
>> <SGE can't decrease the amount of remaining memory by the already
>> <running jobs on a node. If you check only the actual consumption,
>> <this may vary over the runtime of the job (and by later submitted
>> <jobs) and is no reliable indicator what's really left.
>>
>>
>>
>> Does putting a default limit of let's say 3G in qconf -mc for
>> h_vmem, is the same as putting -l h_vmem=3G on the command line?
>
> yes.
>
>> Because the latter will make the job only see a maximum of 3G and I
>> don't really want that since most job will not work correctly if I
>> do that.
>
> Some programs (like Gaussian) don't like that h_stack equals h_vmem
> (h_stack and h_data is set in addition if you set h_vmem). It will
> work again, if you limit h_stack further to around 128M.
>
>> The other point is that I don't currently know how much memory each
>> program will take while running. I guess I could do an array with
>> qacct -j $JOBARRAY | grep maxvmem :)
>>
>>
>> You where saying that putting -l h_vmem=10G will change the ulimit
>> in Linux for that job, but will it also reserve that amount of
>> memory for the job and other jobs wont be able to use this memory?
>
> If you made the complex consumable and defined it for every exec
> host: yes
>
> -- Reuti
>
>
>>
>>
>> Thanks.
>>
>>
>>
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Wed 2008-01-23 15:30
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] comprehensive -l limit documentation
>>
>> Hi,
>>
>> Am 23.01.2008 um 20:28 schrieb Alexandre Racine:
>>
>>> --- Q1 : So I have done this.
>>> qconf -mc
>>> #name   shortcut type   relop requestable consumable default   
>>> urgency
>>> h_vmem  h_vmem   MEMORY <=    YES         YES        NONE     0
>>>
>>> qconf -me server1
>>> complex_values        h_vmem=16G
>>>
>>> qconf -me server2
>>> complex_values        h_vmem=32G
>>>
>>> ... and in the script I added -l h_vmem=20G and the job will run on
>>> server2.
>>
>> Great!
>>
>>> I have play around these, so is this a bug report if I say that I
>>> could put in qconf -me server1, h_vmem=200G and there is no error
>>> message? (The server only have 32G of memory and no swap)?
>>
>> This is because the default of the -w switch is n (none) for qsub.
>> You can use -w e and should see something like: No suitable queues.
>> If you like, you can put this in the sge_request file as default.
>>
>>
>>> --- Q2 Does this config up here is optimal for my need? I don't
>>> want any limit by default, but obliviously I want to be able to
>>> request some reserved memory for some task.
>>
>> To me it looks okay. But you will need a default limit - otherwise
>> SGE can't decrease the amount of remaining memory by the already
>> running jobs on a node. If you check only the actual consumption,
>> this may vary over the runtime of the job (and by later submitted
>> jobs) and is no reliable indicator what's really left.
>>
>>
>>> --- Q3 Also, let's say that I have a couple of program running on a
>>> 32G RAM server and that only 10G are free. If I ask for 15G with "-
>>> l h_vmem=15G" and that the host limit is "complex_values
>>> h_vmem=32G", will SGE see this and wait for the free memory before
>>> launching the job?
>>
>> Yes. To check what is left on the host you can use: qhost -F
>>
>> -- Reuti
>>
>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: Wed 2008-01-23 10:53
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] comprehensive -l limit documentation
>>>
>>> Hi,
>>>
>>> Am 23.01.2008 um 16:09 schrieb Alexandre Racine:
>>>
>>>> Mmm, well is my syntax correct?
>>>>
>>>> In the bash file I have put this witch would ask for 20GB of  
>>>> memory.
>>>> #$-l h_vmem=20G
>>>>
>>>> When launching the job, SGE sent the job to a machine with 14G
>>>> free...
>>>>
>>>> qstat
>>>> all.q at server1.com   BIP   1/3       0.12     lx24-amd64
>>>>     400 0.56000 Merli racine      r     01/23/2008 10:10:05     1
>>>>
>>>> $ qhost
>>>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE
>>>> SWAPTO  SWAPUS
>>>> ------------------------------------------------------------------- 
>>>> -
>>>> -
>>>> -
>>>> ---------
>>>> global                  -               -     -       -
>>>> -       -       -
>>>> SERVER2                 lx24-amd64      8  1.13   30.4G    4.4G
>>>> 1.9G     0.0
>>>> SERVER1                 lx24-amd64      4  0.12   14.6G  769.1M
>>>> 2.0G     0.0
>>>>
>>>>
>>>>
>>>> Can SGE use memory from another machine?
>>>
>>> of course not, you would need such things like: http://
>>> www.kerrighed.org/wiki/index.php/Main_Page if you would have a need
>>> for it.
>>>
>>> In your setup h_vmem is for now only a limit per job, but not a
>>> consumable per host which will SGE decrease and increase  
>>> depending on
>>> the submitted jobs on this machine. To do so, you would need to:
>>>
>>> - make h_vmem consumable with a proper default consumption in the
>>> complex definition (qconf -mc)
>>> - give every machine a sensible default for the built in memory
>>> (qconf -me <node>)
>>>
>>> If you have it definied this way as a queue limit and an exec host
>>> limit, the smaller one of the values will be taken for each job.
>>>
>>> -- Reuti
>>>
>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Alexandre Racine
>>>> 514-461-1300 poste 3304
>>>> alexandre.racine at mhicc.org
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: Tue 2008-01-22 17:58
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] comprehensive -l limit documentation
>>>>
>>>> Hi,
>>>>
>>>> Am 22.01.2008 um 23:15 schrieb Alexandre Racine:
>>>>
>>>>> (Bump)
>>>>> For example if my job absolutely need 20G of memory (reservation),
>>>>> what parameter should I use in this list?
>>>>>
>>>>> -l s_data=20G
>>>>> -l h_data=20G
>>>>> -l s_rss=20G
>>>>> -l h_rss=20G
>>>>> -l s_vmem=20G
>>>>> -l h_vmem=20G
>>>>
>>>> you will need just h_vmem. This will set:
>>>>
>>>> data seg size
>>>> stack size
>>>> virtual memory
>>>>
>>>> in the ulimit of the kernel and besides this enable the memory
>>>> control in SGE to observe the job's memory consumption.
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>>
>>>>> Alexandre Racine
>>>>> 514-461-1300 poste 3304
>>>>> alexandre.racine at mhicc.org
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Alexandre Racine [mailto:Alexandre.Racine at mhicc.org]
>>>>> Sent: Wed 2008-01-16 10:59
>>>>> To: users at gridengine.sunsource.net
>>>>> Subject: RE: [GE users] comprehensive -l limit documentation
>>>>>
>>>>> It seems the document you point to is more for statistics of
>>>>> machines then for -l limit reservation. It's all good :) but  
>>>>> what I
>>>>> really need is a comprehensive guide on resources reservation.
>>>>>
>>>>> For example if my job absolutely need 10G of memory (reservation),
>>>>> what parameter should I use in this list?
>>>>>
>>>>> -l s_data 20G
>>>>> -l h_data 20G
>>>>> -l s_rss 20G
>>>>> -l h_rss 20G
>>>>> -l s_vmem 20G
>>>>> -l h_vmem 20G
>>>>>
>>>>>
>>>>>
>>>>> Alexandre Racine
>>>>> Projets spéciaux
>>>>> 514-461-1300 poste 3304
>>>>> alexandre.racine at mhicc.org
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Rayson Ho [mailto:rayrayson at gmail.com]
>>>>> Sent: Tue 2008-01-15 16:46
>>>>> To: users at gridengine.sunsource.net
>>>>> Subject: Re: [GE users] comprehensive -l limit documentation
>>>>>
>>>>> Did you read $SGE_ROOT/doc/load_parameters.asc before??
>>>>>
>>>>> Rayson
>>>>>
>>>>>
>>>>>
>>>>> On Jan 15, 2008 3:44 PM, Alexandre Racine
>>>>> <Alexandre.Racine at mhicc.org> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Somehow, we now have to use some limits/ressources reservation  
>>>>>> for
>>>>>> some jobs. Looking around man qsub, man complex, man queue_conf
>>>>>> and a little bit on the SGE website, I can't really find any
>>>>>> comprehensive documentation about the subject.
>>>>>>
>>>>>> For example, I saw the -l mem_total=6G on the web, but can't find
>>>>>> it in the official documentation.
>>>>>>
>>>>>> Searching for "mem_total" in the administration guide gives 0
>>>>>> result (for 6.0), and in the user guide, there is the listing of
>>>>>> "qconf -sc", but no descriptions.
>>>>>>
>>>>>> I use SGE 6.0.
>>>>>>
>>>>>> Is there a comprehensive guide on resources reservation  
>>>>>> somewhere?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Alexandre Racine
>>>>>> Projets spéciaux
>>>>>> 514-461-1300 poste 3304
>>>>>> alexandre.racine at mhicc.org
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> -
>>>>>> -
>>>>>> -
>>>>>> -
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-
>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> -
>>>>> -
>>>>> -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> -
>>>>> -
>>>>> -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-
>>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> -
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> -
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list