[GE users] Maintaining good memory usage

Aaron Turner aaron at cs.york.ac.uk
Fri Apr 30 10:12:01 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello,

We have a shared memory (SunFire 6800) machine with
20 processors and 44GB RAM.

With a series of queues of varying allowable run
times, appropriate subordination and suspend thresholds
we are getting a good, constant CPU load with few
problems.

However, we are having some problems keeping memory
load down to acceptable levels as users are running
large jobs.

What we would like to do is:-
    1. Continue to maximise CPU load as much as possible
    2. Keep memory usage within limits
    3. Allow users as much flexibility as possible (i.e.
       try to accomodate those wanting to run large memory
       jobs)


Currently most queues have a 4GB stack size limit, which
accomodates most users nicely without having to have too
great a plethora of queues and too complex subordination.
However # slots * stack size is greater than 44GB. Typically
many users run jobs with a memory footprint of rather less
than 4GB, though.

We have a single slot queue enabled for large memory jobs
up to 16GB, and so the total memory usage possible is very
much greater than the available memory.

Queue selection for users is via -l h_rss etc, with the users
suggesting what they think their memory usage will be for
that job.

What is the simplest way of keeping on top of the memory
usage issue, both from my point of view and for that of
the users? My initial thought is to create a consumable
resource for memory, for the host, that users can request.
However there is no guarantee that users will actually
request an amount of memory that is accurate, and so users
may be effectively locked out by users requesting more
memory than they need. This would then reduce throughput.
Also I need a mechanism to prevent most users from requesting
more than 4GB so I can control the users allowed to submit
very large memory jobs, again to ensure that throughput is
maintained.

Any hints?

Thanks

    Aaron Turner




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list