[GE users] qhost MEMUSE

reuti reuti at staff.uni-marburg.de
Wed Feb 18 16:16:23 GMT 2009


Am 18.02.2009 um 16:46 schrieb futurity:

> I didn't know about this "qacct" command.  It seems to be very  
> useful.  The
> maxvmem value doesn't seem to match up with the experienced  
> increase in
> memory use though which is a shame.

The accounting record is written after the job finished. Is it just a  
serial job which isn't recorded correctly?


> It takes quite a while to return results for a particular job.  I  
> take it
> that this is because I've never cleared up the file which stores  
> all this
> information.  Is there a way of pruning it?

You can just delete/rename the file $SGE_ROOT/default/common/accounting

-- Reuti


> Thanks again Reuti for all your help.
>
> Neil
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 17 February 2009 19:00
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] qhost MEMUSE
>
> Am 17.02.2009 um 19:52 schrieb futurity:
>
>> We are using vf as a consumable.
>>
>> Our users have used our legacy grid for many years without any
>> restrictions.
>> As a result they are very worried about configuring the grid to kill
>> off their jobs when they exceed specific limits (although I see the
>> benefits of killing of such rouge jobs).
>>
>> Many users have the qsub commands so embedded into their scripts
>> (scripts that submit other scripts etc), that it'll be really hard  
>> for
>> them to change their arguments.  For users submitting jobs with large
>> memory requirements we are going to have to force them to state their
>> required memory as they will force memory to swap, but for most users
>> we'll have to use some sensible default values for vf.
>>
>> To do this I have to try and calculate a memory value that most jobs
>> will fall under.  I can't just make this a large number, otherwise  
>> our
>> 8 core machines will have fully allocated vf, but will also have
>> unused cores.  I also have to try and police user jobs so that if I
>> see jobs exceeding these default values, then I can force them to
>> declare their memory requirements.
>>
>> After reading your and Andreas' responses, I can now see why it's not
>> quite as simple as monitoring the memory consumed by a users job.  I
>> think I've also seen that jobs have a knock on effect to the OS, so a
>> job may be 256MB in size, but may cause the OS to use up additional
>> memory in support services.  Do the grid processes on the machine  
>> also
>> consume significantly more memory when each additional job runs?
>>
>> I'm wondering if the easiest solution is to measure the memory use
>> when no jobs are running on a machine.  Then submit lots of jobs of
>> the same type until either a machine's slots are filled.  Take the  
>> "no
>> job memory used"
>> away from the "all slots filled memory used" and then divide the
>> result by the number of slots?
>
> The maxvmem used is also listed in the account records of the job  
> (at the
> end):
>
> $ qacct -j 79563
> ....
> maxvmem      282.789M
>
> -- Reuti
>
>
>>
>> Neil
>>
>> -----Original Message-----
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: 17 February 2009 18:08
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] qhost MEMUSE
>>
>> Am 17.02.2009 um 18:14 schrieb futuritymmx:
>>
>>> Yes we are experiencing jobs not being able to reserve memory.  At
>>> such times the physical and swap memory appears to have been totally
>>> used up.
>>>
>>> Thanks to your last response about the difference between "free -m"
>>> value
>>> and "qhost" value, it appears that when there is free memory it may
>>> be used by buffers and caches, but when the processes require all  
>>> the
>>> memory that these buffers and caches disappear as expected.
>>>
>>> I'm just trying to track down which users are submitting the largest
>>> memory jobs so that they can provide accurate "vf" values to qsub.
>>> As you say, you have to track down the sum of all the memory  
>>> usage by
>>> all the process created by each job.
>>
>> You made vf consumable? Another option is to use h_vmem in a similar
>> manner.
>> Difference is, that h_vmem will be enforced, hence the jobs being
>> killed if they consume too much memory. vf is only a guidance.
>>
>>
>> -- Reuit
>>
>>> Scary task!
>>>
>>> Neil
>>>
>>> -----Original Message-----
>>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: 17 February 2009 13:31
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] qhost MEMUSE
>>>
>>> Hi,
>>>
>>> Am 16.02.2009 um 20:43 schrieb futurity:
>>>
>>>> Thanks Reuti.
>>>>
>>>> Is there any easy way to gather job memory usage?
>>>
>>> well you could sum up in a script the consumption of all processes
>>> belonging to the sgeexecd. Do you need this to get the information
>>> about used memory by local interactive usage of a workstation  
>>> outside
>>> of SGE?
>>>
>>> -- Reuti
>>>
>>>
>>>> Regards
>>>>
>>>> Neil
>>>>
>>>> -----Original Message-----
>>>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: 16 February 2009 17:38
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] qhost MEMUSE
>>>>
>>>> Hi,
>>>>
>>>> Am 16.02.2009 um 18:22 schrieb futurity:
>>>>
>>>>> I was wondering if the MEMUSE value returned by "qhost" represents
>>>>> the memory used by all processes on a machine, or just the memory
>>>>> used by grid jobs running on it?
>>>>
>>>> It's from all processes on a node. Just the output you get also  
>>>> from
>>>> a command like:
>>>>
>>>> $ free -m
>>>>
>>>> (or -g) next to "+/- buffers". I.e. a system information. Otherwise
>>>> the output should read zero in an empty cluster.
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> Regards
>>>>>
>>>>> Neil
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=107424
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=107497
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=1
>>> 08123
>>>
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=108261
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=1
>> 08290
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>> _____________________________________________________________________ 
>> _
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> _____________________________________________________________________ 
>> _
>>
>> _____________________________________________________________________ 
>> _
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> _____________________________________________________________________ 
>> _
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=108317
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=1
> 08321
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=109055
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=109068

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list