[GE users] qhost MEMUSE

reuti reuti at staff.uni-marburg.de
Tue Feb 17 19:00:01 GMT 2009


Am 17.02.2009 um 19:52 schrieb futurity:

> We are using vf as a consumable.
>
> Our users have used our legacy grid for many years without any  
> restrictions.
> As a result they are very worried about configuring the grid to  
> kill off
> their jobs when they exceed specific limits (although I see the  
> benefits of
> killing of such rouge jobs).
>
> Many users have the qsub commands so embedded into their scripts  
> (scripts
> that submit other scripts etc), that it'll be really hard for them  
> to change
> their arguments.  For users submitting jobs with large memory  
> requirements
> we are going to have to force them to state their required memory  
> as they
> will force memory to swap, but for most users we'll have to use some
> sensible default values for vf.
>
> To do this I have to try and calculate a memory value that most  
> jobs will
> fall under.  I can't just make this a large number, otherwise our 8  
> core
> machines will have fully allocated vf, but will also have unused  
> cores.  I
> also have to try and police user jobs so that if I see jobs  
> exceeding these
> default values, then I can force them to declare their memory  
> requirements.
>
> After reading your and Andreas' responses, I can now see why it's  
> not quite
> as simple as monitoring the memory consumed by a users job.  I  
> think I've
> also seen that jobs have a knock on effect to the OS, so a job may  
> be 256MB
> in size, but may cause the OS to use up additional memory in support
> services.  Do the grid processes on the machine also consume  
> significantly
> more memory when each additional job runs?
>
> I'm wondering if the easiest solution is to measure the memory use  
> when no
> jobs are running on a machine.  Then submit lots of jobs of the  
> same type
> until either a machine's slots are filled.  Take the "no job memory  
> used"
> away from the "all slots filled memory used" and then divide the  
> result by
> the number of slots?

The maxvmem used is also listed in the account records of the job (at  
the end):

$ qacct -j 79563
....
maxvmem      282.789M

-- Reuti


>
> Neil
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 17 February 2009 18:08
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] qhost MEMUSE
>
> Am 17.02.2009 um 18:14 schrieb futuritymmx:
>
>> Yes we are experiencing jobs not being able to reserve memory.  At
>> such times the physical and swap memory appears to have been totally
>> used up.
>>
>> Thanks to your last response about the difference between "free -m"
>> value
>> and "qhost" value, it appears that when there is free memory it  
>> may be
>> used by buffers and caches, but when the processes require all the
>> memory that these buffers and caches disappear as expected.
>>
>> I'm just trying to track down which users are submitting the largest
>> memory jobs so that they can provide accurate "vf" values to  
>> qsub.  As
>> you say, you have to track down the sum of all the memory usage by  
>> all
>> the process created by each job.
>
> You made vf consumable? Another option is to use h_vmem in a  
> similar manner.
> Difference is, that h_vmem will be enforced, hence the jobs being  
> killed if
> they consume too much memory. vf is only a guidance.
>
>
> -- Reuit
>
>> Scary task!
>>
>> Neil
>>
>> -----Original Message-----
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: 17 February 2009 13:31
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] qhost MEMUSE
>>
>> Hi,
>>
>> Am 16.02.2009 um 20:43 schrieb futurity:
>>
>>> Thanks Reuti.
>>>
>>> Is there any easy way to gather job memory usage?
>>
>> well you could sum up in a script the consumption of all processes
>> belonging to the sgeexecd. Do you need this to get the information
>> about used memory by local interactive usage of a workstation outside
>> of SGE?
>>
>> -- Reuti
>>
>>
>>> Regards
>>>
>>> Neil
>>>
>>> -----Original Message-----
>>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: 16 February 2009 17:38
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] qhost MEMUSE
>>>
>>> Hi,
>>>
>>> Am 16.02.2009 um 18:22 schrieb futurity:
>>>
>>>> I was wondering if the MEMUSE value returned by "qhost" represents
>>>> the memory used by all processes on a machine, or just the memory
>>>> used by grid jobs running on it?
>>>
>>> It's from all processes on a node. Just the output you get also from
>>> a command like:
>>>
>>> $ free -m
>>>
>>> (or -g) next to "+/- buffers". I.e. a system information. Otherwise
>>> the output should read zero in an empty cluster.
>>>
>>> -- Reuti
>>>
>>>
>>>> Regards
>>>>
>>>> Neil
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=107424
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=107497
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=1
>> 08123
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=108261
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=1
> 08290
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=108317
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=108321

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list