[GE users] qrsh segfault with vmemoryuse limit

spiderman minoru.hamakawa at sun.com
Wed Mar 18 13:54:39 GMT 2009


Ohi-san,

It seems to depend on stack size.
When you set s_vmem like 1073741824 to some queue,
SGE seems to set s_stack as the same value of s_vmem.

So please set 10485760 to s_stack of the queue.
It will clear the problem.

Thank you.
Minoru Hamakawa

ohi wrote:
> Hi,
> 
> I use soft vmemoryuse in my queue.
>> [ohi at gw16 ~]$ qconf -sq all.q |grep s_vmem
> 
>> s_vmem                34359738368
> 
> I run qrsh, and I got following error.
>> [ohi at gw16 ~]$ qrsh pwd
>> Segmentation fault
> 
> When I unlimited vmemoryuse by shell command,
> I did not get segfaul.
>> [ohi at gw16 ~]$ unlimit vmemoryuse
>> [ohi at gw16 ~]$ qrsh pwd
>> /home/ohi
> 
> I use strace about qrsh with vmemoryuse limit,
> I got following output
>> mmap(NULL, 34359742464, PROT_READ|PROT_WRITE, MAP_PRIVATE| 
>> MAP_ANONYMOUS|MAP_32BIT, -1, 0) = -1 ENOMEM (Cannot allocate memory)
>> mmap(NULL, 34359742464, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE| 
>> MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
>> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>> +++ killed by SIGSEGV +++
> qrsh used mmap and try to keep
> 34,359,742,464 memory.
> This number over my vmemoryuse limit
> 34,359,738,368 memory.
> 
> Why Does qrsh try to keep such a big memory ?
> Is this Bug?
> 
> I found this phenomenon, when I submitted
> MPI job. The MPI job was crushed by qrsh
> segfault.
> 
> I also set another amount of vmemoryuse.
>> [ohi at gw16 ~]$ qconf -sq all.q |grep s_vmem
>> s_vmem                68719476736
> 
> Next time, qrsh try to keep below memory,
> and cause segfault.
>> mmap(NULL, 68719480832, PROT_READ|PROT_WRITE, MAP_PRIVATE| 
>> MAP_ANONYMOUS|MAP_32BIT, -1, 0) = -1 ENOMEM (Cannot allocate memory)
>> mmap(NULL, 68719480832, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE| 
>> MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
>> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>> +++ killed by SIGSEGV +++
> 
> My Server environment is below.
>> [ohi at gw16 ~]$ uname -a
>> Linux gw16 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:32:05 EDT 2008  
>> x86_64 x86_64 x86_64 GNU/Linux
>> [ohi at gw16 lx24-amd64]$ ./sge_qmaster -help
>> SGE 6.2u2
>> usage: sge_qmaster [options]
>>    [-help]                                  print this help
> I use tcsh.
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=129443
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=135361

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list