[SGE-discuss] suspend threshold ping-pong

Stella Levin stella_levin2003 at yahoo.com
Sat Oct 8 23:11:18 BST 2011


Yes. In our environment the jobs are memory consuming and the memory is a critical resource.

In some cases we cannot predict correctly the size of the job.

Thanks for reply.
Stella


________________________________
From: Reuti <reuti at staff.uni-marburg.de>
To: Stella Levin <stella_levin2003 at yahoo.com>
Cc: "sge-discuss at liv.ac.uk" <sge-discuss at liverpool.ac.uk>
Sent: Friday, October 7, 2011 1:51 AM
Subject: Re: [SGE-discuss] suspend threshold ping-pong

Hi,

Am 02.10.2011 um 11:29 schrieb Stella Levin:

> Hi sge-discuss group,
> we defined 
> suspend_thresholds mt_mem_swap_io=1
> and
> mt_mem_swap_io=1 when "writing to swap" happens ("so" column of vmstat)
> We experience "ping-pong" behavior with suspend - continue of jobs.
> The job starts to write to swap and it is suspended, after the suspension no other jobs writing to swap and within suspend_interval the job is continued... and suspended again and continued again.
> Sometimes we cannot predict exactly the size of the job, and they start to swap. 
> 
> - Is it possible to continue the job with different threshold conditions, for example there is a free memory on the host for the job, or something similar
> - Other options to solve the problem ?

unfortunately this is a known "bug/feature/issue" and there is already an issue for it:

https://arc.liv.ac.uk/trac/SGE/ticket/92

Though it's not implemented.

==

Your current setup is to schedule jobs to a node until it starts to swap?

-- Reuti


> Thanks a lot.
> Stella
> 
> 
> 
> _______________________________________________
> SGE-discuss mailing list
> SGE-discuss at liv.ac.uk
> https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://arc.liv.ac.uk/pipermail/sge-discuss/attachments/20111008/38af2f64/attachment.html>


More information about the SGE-discuss mailing list