[GE issues] [Issue 2916] New - qrsh large memory consumption in IA64

jlopez jlopez at cesga.es
Mon Feb 16 14:57:33 GMT 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=2916
                 Issue #|2916
                 Summary|qrsh large memory consumption in IA64
               Component|gridengine
                 Version|6.2u1
                Platform|All
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|execution
             Assigned to|pollinger
             Reported by|jlopez






------- Additional comments from jlopez at sunsource.net Mon Feb 16 06:57:32 -0800 2009 -------
We have found that the qrsh processes using the builtin method are using 
more than 500MB per processes in our IA64 cluster. 

This means that the memory consumption in 
the MASTER node increases rapidly when the number of slaves increases.

Here is an example:
18481 aurelio   15   0  519m 4128 3440 S    0  0.0   0:00.02 
qrsh                                                                    
18482 aurelio   15   0  519m 4128 3440 S    0  0.0   0:00.01 
qrsh                                                                    
18475 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18476 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.02 
qrsh                                                                    
18477 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18478 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18479 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18480 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18483 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.02 
qrsh                                                                    
18484 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18485 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18486 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18487 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.02 
qrsh                                                                    
18488 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                    
18489 aurelio   15   0  519m 3968 3296 S    0  0.0   0:00.00 
qrsh           

And the same job resubmintted but using ssh to expand the processes:
19560 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19561 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19562 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.03 
ssh                                                                     
19563 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.03 
ssh                                                                     
19564 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19565 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19566 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19567 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19568 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19569 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.01 
ssh                                                                     
19570 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.02 
ssh                                                                     
19571 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.03 
ssh                                                                     
19572 aurelio   15   0 12240 5152 3920 S    0  0.0   0:00.04 
ssh                                                                     
19573 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.03 
ssh                                                                     
19574 aurelio   16   0 12240 5152 3920 S    0  0.0   0:00.03 ssh  

As it can be seen in the first case the virtual memory consumed by the 
job is increased in 7GB.


In some cases the problem is even worse because there are qrsh processes 
that are consuming 4GB of virtual memory after several hours running:
25141 csedamsp  15   0 4104m 3984 3296 S    0  0.0   0:00.00 
qrsh                                                                           

25142 csedamsp  16   0 4104m 3984 3296 S    0  0.0   0:00.01 
qrsh                                                                           

25140 csedamsp  15   0 4103m 3968 3296 S    0  0.0   0:00.01 
qrsh                                                                           

25143 csedamsp  15   0 4103m 3968 3296 S    0  0.0   0:00.02 qrsh

We tried to recompile qrsh using Intel compiler and we get the same behavior.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=107340

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list