[GE users] Maintaining good memory usage

Aaron Turner aaron at cs.york.ac.uk
Fri Apr 30 14:27:23 BST 2004


Andreas,

Thanks for that. I will implement this in the next couple of weeks
when I have time, but I will make the users aware of this upcoming
change at the meeting next week.

We intend to move to 6.0, when we have enough time to implement
the move. We have it configured on a test machine at the moment,
while we get to grips with the changes (thankfully much of it is
very familiar).

Aaron Turner

Andreas Haas wrote:

>You could use "h_vmem" as host-based consumable. For oversubscribing
>your SF6800 memory-wise you specifiy e.g. h_vmem=50G rather than 44G.
>To enforce an upper limit for particular users you could use an
>additional queue-based "h_vmem" consumable with different access
>lists priviledges for different users. Use of "h_vmem" is recommended
>b/c for "h_vmem" Grid Engine execd tracks/enforces memory consumption
>on a per job basis rather than only on a per process basis.
>
>Well, for ruling out jobs starve due to large memory request you
>require 6.0 resource reservation.
>
>Cheers,
>Andreas
>
>
>On Fri, 30 Apr 2004, Aaron Turner wrote:
>
>  
>
>>Hello,
>>
>>We have a shared memory (SunFire 6800) machine with
>>20 processors and 44GB RAM.
>>
>>With a series of queues of varying allowable run
>>times, appropriate subordination and suspend thresholds
>>we are getting a good, constant CPU load with few
>>problems.
>>
>>However, we are having some problems keeping memory
>>load down to acceptable levels as users are running
>>large jobs.
>>
>>What we would like to do is:-
>>    1. Continue to maximise CPU load as much as possible
>>    2. Keep memory usage within limits
>>    3. Allow users as much flexibility as possible (i.e.
>>       try to accomodate those wanting to run large memory
>>       jobs)
>>
>>
>>Currently most queues have a 4GB stack size limit, which
>>accomodates most users nicely without having to have too
>>great a plethora of queues and too complex subordination.
>>However # slots * stack size is greater than 44GB. Typically
>>many users run jobs with a memory footprint of rather less
>>than 4GB, though.
>>
>>We have a single slot queue enabled for large memory jobs
>>up to 16GB, and so the total memory usage possible is very
>>much greater than the available memory.
>>
>>Queue selection for users is via -l h_rss etc, with the users
>>suggesting what they think their memory usage will be for
>>that job.
>>
>>What is the simplest way of keeping on top of the memory
>>usage issue, both from my point of view and for that of
>>the users? My initial thought is to create a consumable
>>resource for memory, for the host, that users can request.
>>However there is no guarantee that users will actually
>>request an amount of memory that is accurate, and so users
>>may be effectively locked out by users requesting more
>>memory than they need. This would then reduce throughput.
>>Also I need a mechanism to prevent most users from requesting
>>more than 4GB so I can control the users allowed to submit
>>very large memory jobs, again to ensure that throughput is
>>maintained.
>>
>>Any hints?
>>
>>Thanks
>>
>>    Aaron Turner
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>  
>



More information about the gridengine-users mailing list