[GE users] complex use of complexes
gragghia at utk.edu
Mon May 10 23:43:37 BST 2010
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
I was planning to set it up exactly as you describe, however it won't work for this particular job (as far as I know). The problem is that when a single large MPI job makes a complex request (-l h_vmem=128G), that request is understood to be _per process_: A 480 process job with "-l h_vmem=128" will actually be requesting 61TB of RAM. I need to be able to request 128GB of RAM for only the first process of the job and then 2GB of RAM for all the other processes. This is not possible (please prove me wrong though), so I am looking at ways to make an exception for this particular job.
On 5/10/2010 6:08 PM, iank wrote:
On Mon, May 10, 2010 at 2:43 PM, gragghia <gragghia at utk.edu<mailto:gragghia at utk.edu>> wrote:
We are wanting to add management of RAM usage on our system by setting
h_vmem to consumable and adding a reasonable default value. The problem
is that we have one parallel job that needs to use 128GB on its rank
zero "master" process but only 2GB of RAM on the slave processes.
Setting h_vmem=2GB will cause the rank zero process to fail, but using
hvmem=128G will not be able to run because the other compute nodes do
not have that much RAM. How can we make this work? If it isn't
possible, can we make an exception to the management of vmem so that
this job doesn't get killed due to over-usage?
You should be able to make h_vmem consumable and requestable. That way, the default may be set at 2GB, which means if the job is not submitted with -l h_vmem=XG, it will by default grab 2GB, but can request more. And, the "master" process can be submitted with -l h_vmem=128G, this ensuring it can only run on the one system with the correct amount of RAM. You can also require the -l h_vmem flag to be used so that every job submitted must ask for the proper amount of RAM or else it will not run.
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
Newton HPC Program http://newton.utk.edu/
Office of Information Technology
Research Computing Support
Professional Technical Services
The University of Tennessee
2309 Kingston Pike
Knoxville, TN 37996
| One Contact OIT: 865-974-9900 |
| Many Solutions help.utk.edu |
More information about the gridengine-users