[GE users] complex use of complexes

mhanby mhanby at uab.edu
Wed May 12 00:16:10 BST 2010


Dang, I figured (assuming tight integration) that the master and workers would all be summed.

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Tuesday, May 11, 2010 11:44 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] complex use of complexes

Am 11.05.2010 um 15:56 schrieb mhanby:

> another workaround would be to make h_vmem consumable = JOB instead of YES. (I believe I have the terminology correct)
>  
> You would then request h_vmem as the sum for the entire job and not per task.
>  
> You'd have to train your users appropriately to think in terms of summing their RAM requirements.

I fear this won't work: it will be subtracted once and only from the master queue. So all the consumptions on the slave nodes are ignored, although they take place.

-- Reuti


>  
> From: gragghia [mailto:gragghia at utk.edu] 
> Sent: Monday, May 10, 2010 5:44 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] complex use of complexes
>  
> I was planning to set it up exactly as you describe, however it won't work for this particular job (as far as I know).  The problem is that when a single large MPI job makes a complex request (-l h_vmem=128G), that request is understood to be _per process_:  A 480 process job with "-l h_vmem=128" will actually be requesting 61TB of RAM.  I need to be able to request 128GB of RAM for only the first process of the job and then 2GB of RAM for all the other processes.  This is not possible (please prove me wrong though), so I am looking at ways to make an exception for this particular job.
> 
> - Gerald
> 
> On 5/10/2010 6:08 PM, iank wrote:
> Hi Gerald,
> 
> On Mon, May 10, 2010 at 2:43 PM, gragghia <gragghia at utk.edu> wrote:
> We are wanting to add management of RAM usage on our system by setting
> h_vmem to consumable and adding a reasonable default value.  The problem
> is that we have one parallel job that needs to use 128GB on its rank
> zero "master" process but only 2GB of RAM on the slave processes.
> Setting h_vmem=2GB will cause the rank zero process to fail, but using
> hvmem=128G will not be able to run because the other compute nodes do
> not have that much RAM.  How can we make this work?  If it isn't
> possible, can we make an exception to the management of vmem so that
> this job doesn't get killed due to over-usage?
> 
> Thanks,
> 
> 
> You should be able to make h_vmem consumable and requestable. That way, the default may be set at 2GB, which means if the job is not submitted with -l h_vmem=XG, it will by default grab 2GB, but can request more. And, the "master" process can be submitted with -l h_vmem=128G, this ensuring it can only run on the one system with the correct amount of RAM. You can also require the -l h_vmem flag to be used so that every job submitted must ask for the proper amount of RAM or else it will not run.
> 
> Ian
> -- 
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
> 
> 
> 
> -- 
> Gerald Ragghianti
>  
> Newton HPC Program http://newton.utk.edu/
> Office of Information Technology
>   Research Computing Support
>   Professional Technical Services
>  
> The University of Tennessee
> 2309 Kingston Pike
> Knoxville, TN 37996
> Phone: 865-974-2448
>  
> /-------------------------------------\
> | One Contact       OIT: 865-974-9900 |
> | Many Solutions         help.utk.edu |
> \-------------------------------------/

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256966

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257006

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list