[GE users] qhost MEMUSE

futurity neil at futurity.co.uk
Wed Feb 18 15:46:47 GMT 2009


I didn't know about this "qacct" command.  It seems to be very useful.  The
maxvmem value doesn't seem to match up with the experienced increase in
memory use though which is a shame.

It takes quite a while to return results for a particular job.  I take it
that this is because I've never cleared up the file which stores all this
information.  Is there a way of pruning it?

Thanks again Reuti for all your help.

Neil

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: 17 February 2009 19:00
To: users at gridengine.sunsource.net
Subject: Re: [GE users] qhost MEMUSE

Am 17.02.2009 um 19:52 schrieb futurity:

> We are using vf as a consumable.
>
> Our users have used our legacy grid for many years without any 
> restrictions.
> As a result they are very worried about configuring the grid to kill 
> off their jobs when they exceed specific limits (although I see the 
> benefits of killing of such rouge jobs).
>
> Many users have the qsub commands so embedded into their scripts 
> (scripts that submit other scripts etc), that it'll be really hard for 
> them to change their arguments.  For users submitting jobs with large 
> memory requirements we are going to have to force them to state their 
> required memory as they will force memory to swap, but for most users 
> we'll have to use some sensible default values for vf.
>
> To do this I have to try and calculate a memory value that most jobs 
> will fall under.  I can't just make this a large number, otherwise our 
> 8 core machines will have fully allocated vf, but will also have 
> unused cores.  I also have to try and police user jobs so that if I 
> see jobs exceeding these default values, then I can force them to 
> declare their memory requirements.
>
> After reading your and Andreas' responses, I can now see why it's not 
> quite as simple as monitoring the memory consumed by a users job.  I 
> think I've also seen that jobs have a knock on effect to the OS, so a 
> job may be 256MB in size, but may cause the OS to use up additional 
> memory in support services.  Do the grid processes on the machine also 
> consume significantly more memory when each additional job runs?
>
> I'm wondering if the easiest solution is to measure the memory use 
> when no jobs are running on a machine.  Then submit lots of jobs of 
> the same type until either a machine's slots are filled.  Take the "no 
> job memory used"
> away from the "all slots filled memory used" and then divide the 
> result by the number of slots?

The maxvmem used is also listed in the account records of the job (at the
end):

$ qacct -j 79563
....
maxvmem      282.789M

-- Reuti


>
> Neil
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 17 February 2009 18:08
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] qhost MEMUSE
>
> Am 17.02.2009 um 18:14 schrieb futuritymmx:
>
>> Yes we are experiencing jobs not being able to reserve memory.  At 
>> such times the physical and swap memory appears to have been totally 
>> used up.
>>
>> Thanks to your last response about the difference between "free -m"
>> value
>> and "qhost" value, it appears that when there is free memory it may 
>> be used by buffers and caches, but when the processes require all the 
>> memory that these buffers and caches disappear as expected.
>>
>> I'm just trying to track down which users are submitting the largest 
>> memory jobs so that they can provide accurate "vf" values to qsub.  
>> As you say, you have to track down the sum of all the memory usage by 
>> all the process created by each job.
>
> You made vf consumable? Another option is to use h_vmem in a similar 
> manner.
> Difference is, that h_vmem will be enforced, hence the jobs being 
> killed if they consume too much memory. vf is only a guidance.
>
>
> -- Reuit
>
>> Scary task!
>>
>> Neil
>>
>> -----Original Message-----
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: 17 February 2009 13:31
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] qhost MEMUSE
>>
>> Hi,
>>
>> Am 16.02.2009 um 20:43 schrieb futurity:
>>
>>> Thanks Reuti.
>>>
>>> Is there any easy way to gather job memory usage?
>>
>> well you could sum up in a script the consumption of all processes 
>> belonging to the sgeexecd. Do you need this to get the information 
>> about used memory by local interactive usage of a workstation outside 
>> of SGE?
>>
>> -- Reuti
>>
>>
>>> Regards
>>>
>>> Neil
>>>
>>> -----Original Message-----
>>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: 16 February 2009 17:38
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] qhost MEMUSE
>>>
>>> Hi,
>>>
>>> Am 16.02.2009 um 18:22 schrieb futurity:
>>>
>>>> I was wondering if the MEMUSE value returned by "qhost" represents 
>>>> the memory used by all processes on a machine, or just the memory 
>>>> used by grid jobs running on it?
>>>
>>> It's from all processes on a node. Just the output you get also from 
>>> a command like:
>>>
>>> $ free -m
>>>
>>> (or -g) next to "+/- buffers". I.e. a system information. Otherwise 
>>> the output should read zero in an empty cluster.
>>>
>>> -- Reuti
>>>
>>>
>>>> Regards
>>>>
>>>> Neil
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=107424
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=107497
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=1
>> 08123
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=108261
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=1
> 08290
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> ______________________________________________________________________
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=108317
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=1
08321

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=109055

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list