[GE users] Error: libc.so.6: failed to map segment from shared object:

prentice prentice at ias.edu
Wed Feb 25 15:54:54 GMT 2009


reuti wrote:
> Am 25.02.2009 um 15:43 schrieb prentice:
> 
>> reuti wrote:
>>> Am 24.02.2009 um 19:11 schrieb prentice:
>>>
>>>> My cluster nodes have 16GB of RAM, which SGE detects as 15.7G. I
>>>> want to
>>>> set h_vmem as consumable, so I set h_vmem on all my nodes to a value
>>>> safely below that limit, say 15G (I've also tried 15.5G and 15.7G,
>>>> with
>>>> the same effect):
>>>>
>>>> for i in $(seq -w 64); do qconf -mattr exechost complex_values
>>>> h_vmem=15G node${i}; done
>>>>
>>>> I then set h_vmem to be consumable:
>>>>
>>>> h_vmem              h_vmem     MEMORY      <=    YES         YES
>>>> 0        0
>>>>
>>>> Now when I run an MPI test job (a simple "hello, world" type  
>>>> program,
>>>> that I've been using as a test case for months now), I get this  
>>>> error:
>>>>
>>>> mpirun: error while loading shared libraries: libc.so.6: failed  
>>>> to map
>>>> segment from shared object: Cannot allocate memory
>>> If h_vmem is set, it's also often necessary to request -l h_stack=32M
>>> or 128M.
>>>
>> I've found that I need to set -l h_stack=128M. When h_vmem is set  
>> to be
>> consumable, I get this error:
>>
>>  main|node01|W|job 2785 exceeds job hard limit "h_vmem" of queue
>> "all.q at node01.aurora" (123715584.00000 > limit: 0.00000) - send
>> ing SIGKILL
>>
>> I get this error when I set the default h_vmem to 0 or 1.8G or 1.9G,
>> both of which should be within the limits per node ((1.9g x 8) <  
>> 15.7 G).
> 
> This is only what is granted. If this wouldn't be available, the job  
> wouldn't be scheduled. So, the job was started and consumed more than  
> granted. With a limit of 0 not surprising - did you define it in the  
> queue definition?

No, I didn't think I needed to define a default value. I defined a
default of 1.9G a few minutes ago, and that seems to have fixed the
problem. It wasn't working earlier, because, while undoing my changes to
get the cluster back to a working state, I removed '-l h_vmem=128M' from
  my sge_request file.

It looks like things are finally working stably again.

> 
>> Any idea what's causing this error?
> 
> Be aware, that the limit is your program code plus your data.
> 
> -- Reuti
> 
> 
>> h_vmem for each node us currently
>> set to 15.7G.


-- 
Prentice

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=114355

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list