[GE users] h_vmem and MVAPICH tight integration

Olli-Pekka Lehto oplehto at csc.fi
Wed Aug 16 13:06:26 BST 2006

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
> Hi,
> Am 16.08.2006 um 12:52 schrieb Olli-Pekka Lehto:
>> Hello,
>> We're running SGE v.6u8 and have configured it to use tight  
>> integration with MVAPICH (0.9.7-2). To enforce memory limits we are  
>> using the h_vmem -resource, however with MVAPICH the setting the  
>> limit with '-l h_vmem' causes processes to be killed even though  only 
>> a fraction of the reserved memory is used. We have 4-core  Opteron 
>> nodes with 8GB RAM and Mellanox InfiniHost III Ex HCA:s. Is  there a 
>> way to work around this?
> are the processes killed by SGE (then you could check the messages  
> files on the qmaster and nodes about an entry), or dying on their own?
> For some programs it is necessary to limit also the stack, if you use  
> h_vmem. Sometimes a small value of 32M is already working.

Setting the stack limit explicitly to 32M with h_stack helped. Thank you!

As a future reference this is the (somewhat misleading) error message 
that was produced by the failed runs:

[1] Abort: cannot open HCA (Resources temporary unavailable) at line 323 
in file viainit.c

By the way, is there a mechanism in SGE to report to the end user and/or 
administrator that a job has been killed due to resource overconsumption?

Olli-Pekka Lehto, Systems Specialist, Systems Services, CSC
PO Box 405 02101 Espoo, Finland; tel +358 9 457 2215, fax +358 9 4572302
CSC is the Finnish IT Center for Science, www.csc.fi,
e-mail: Olli-Pekka.Lehto at csc.fi

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list