[GE users] SGE large memory jobs

adary adary at marvell.com
Thu Jul 16 07:51:07 BST 2009


Yes, you just need to change the NO to YES to make it consumable.

For your other question, you need to add mem_free to each host, but you can actually automate it a bit:

for host in `qhost | sed '1,3d' | awk '{ print $1 }'`; do qconf -mattr exechost complex_values mem_free=8G $host; done

(if all your hosts are 8G hosts ofcourse)

My recommendation is to always define hosts to 5% less ram than they actually have, and to handle RAM requests accordingly, since some RAM is always used by the OS itself

As for consequences, once you make your mem_free consumable and your jobs start requesting RAM, you need to make sure that all jobs that are sent also request RAM.

In my case we created a general wrapper for qsub that will always request ram based on certain defaults, and each user can override it with the need of the application.

-----Original Message-----
From: mbay2002 [mailto:jeff at haferman.com]
Sent: Wednesday, July 15, 2009 11:54 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] SGE large memory jobs

Great, this has all been very helpful.  I really am a rookie with
configuring SGE, so I just want to make sure I get everything right, so
here are 3 additional questions:

1) when I run the qconf -mc, the mem_free line now looks like

#name      shortcut   type      relop requestable consumable default  urgency
#----------------------------------------------------------------------------
mem_free   mf         MEMORY    <=    YES         NO          0        0

Question: do I simply want to change "NO" to "YES" for consumable?  Are
defaults of 0 and 0 for default and urgency okay in general?


2) for the qconf -me <execthost>, the file now looks like

hostname              compute-0-0.local
load_scaling          NONE
complex_values        NONE
user_lists            NONE
xuser_lists           NONE
projects              NONE
xprojects             NONE
usage_scaling         NONE
report_variables      NONE

So, I'll change complex_values to "mem_free=8G"  (because we have 8G
available - so the guy submitting the 5G job will use 5G of the 8G
available so there will still be resources left for others).  Where does
this file live so I can do a sed for all 144 compute nodes that we have,
or is there a way with qmon to set this for all exechosts?

3) Any "unintended consequences" that I should be aware of my making
mem_free a consumable?  e.g., might other qsub jobs break?  When I used
other schedulers in the past, amount of memory needed was always upfront
and center in the job submission scripts.  I'm surprised that this topic
isn't more obvious in the documentation.

Jeff



dom wrote:
> Hi,
> you have to set  the complex_values for each exechost.
> In your case set complex_values to mem_free=5G (using qconf -me <exechost>)
> Set the mem_free complex to consumable to yes (qconf -mc) and submit
> your jobs like this:
>
> qsub -l mem_free=5G java.sge
>
>
> Now only one job per host will be scheduled. Setting complex_value to 8G
> you will have 8G of mem available for consuming
>
> Marco
>
> On 07/12/09 04:43, mbay2002 wrote:
>> A user is running a java based script that requires 5 GB of memory. Each of our nodes has 8 GB of RAM available.
>>
>> He is trying something along the lines of
>>
>> qsub -l mem_free=5G java.sge
>> sleep 10
>> qsub -l mem_free=5G java.sge
>>
>> However, even though we have plenty of open nodes, after submitting several jobs like this, a few end up on the same node, and the jobs start paging.
>>
>> We have a pretty vanilla install of SGE, we haven't done any special configuration.  I honestly do not know SGE well enough to know if there is a simple way to ensure that these jobs get assigned one per node.
>>
>> I've done a bit of RTFM'ing, but could use a hint at this point.
>>
>> This is SGE 6.2
>>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=207419

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=207491

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list