[GE users] Memory quotas

reuti reuti at staff.uni-marburg.de
Tue Feb 10 10:58:38 GMT 2009


Am 09.02.2009 um 21:02 schrieb davidecittaro:

> Hi all, I've a question concerning quota for memory.
> We have here users that drained all the available memory with a little
> number of jobs, all this followed by totally unresponsive servers :-(
> I've read the archives and tried the h_vmem as a consumable, but in
> the end what I've done is setting virtual_free as consumable and
> writing a resource quota for it:

the difference is, that h_vmem is enforced, while virtual_free is  
just a guidance for SGE; as long as the users are fair and know what  
their jobs consume. With h_vmem some jobs might crash, when they need  
just one byte more. But over time these requests can be adjusted.

> $ qconf -srqs MemoryQuota
> {
>     name         MemoryQuota
>     description  Memory quota for users. Nobody can use more than 30
> Gb RAM
>     enabled      TRUE
>     limit        users {*}  to virtual_free=30G
> }

But this is per user, not per host or job (i.e. slot). You have one  
big SMP machine or a cluster of nodes?

> This, at least, allows me to keep jobs in qw state until the quota is
> exceeded. Good. Also, since virtual_free is a load sensor it is
> reported to the quota even if it is not requested. Plus, if a user
> specifies -l virtual_free=X, his remaining quota is lowered by X.
> This seems to be a fair solution but I have some issues I suspect are
> not easy to solve:
> - How can I handle users that run jobs that exceed quota while they
> are running? I mean, if an user submits a job that at a certain point
> allocates for 50 Gb, it drains lot of the memory available

Use h_vmem and these jobs will be killed. And you could specify  
h_vmem as FORCED in the consumable configuration and/or set a high  
default value (and user could lower them). You you implement this, it  
could be useful to enable reservation in the scheduler and request  
reservation with "-R y" in your qsub request.

> - I cannot set a suspend threshold for memory, as the memory
> referenced by a process can't be lowered while it is running (isn't  
> it?)

Correct. Also suspended jobs occupy resources they got granted.

-- Reuti


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list