[GE users] h_vmem and MVAPICH tight integration

Reuti reuti at staff.uni-marburg.de
Wed Aug 16 13:38:27 BST 2006


Am 16.08.2006 um 14:06 schrieb Olli-Pekka Lehto:

> Reuti wrote:
>> Hi,
>> Am 16.08.2006 um 12:52 schrieb Olli-Pekka Lehto:
>>> Hello,
>>>
>>> We're running SGE v.6u8 and have configured it to use tight   
>>> integration with MVAPICH (0.9.7-2). To enforce memory limits we  
>>> are  using the h_vmem -resource, however with MVAPICH the setting  
>>> the  limit with '-l h_vmem' causes processes to be killed even  
>>> though  only a fraction of the reserved memory is used. We have 4- 
>>> core  Opteron nodes with 8GB RAM and Mellanox InfiniHost III Ex  
>>> HCA:s. Is  there a way to work around this?
>> are the processes killed by SGE (then you could check the  
>> messages  files on the qmaster and nodes about an entry), or dying  
>> on their own?
> >
>> For some programs it is necessary to limit also the stack, if you  
>> use  h_vmem. Sometimes a small value of 32M is already working.
>
> Setting the stack limit explicitly to 32M with h_stack helped.  
> Thank you!
>
> As a future reference this is the (somewhat misleading) error  
> message that was produced by the failed runs:
>
> [1] Abort: cannot open HCA (Resources temporary unavailable) at  
> line 323 in file viainit.c
>
>
>
> By the way, is there a mechanism in SGE to report to the end user  
> and/or administrator that a job has been killed due to resource  
> overconsumption?

Unfortunately you can only get the usual eMail (-m a). But providing  
a better error message in this case is already a known issue for SGE.

-- Reuti  

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list