[GE users] Scalibility of 60u3/u4

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Thu Apr 28 08:26:25 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Chris,

I added some comments below.

Cheers,
Stephan

Chris Croswhite wrote:

>Rayson,
>
>Perfect.  Thanks for the information!
>
>
>On Wed, 2005-04-27 at 16:41, Rayson Ho wrote:
>  
>
>>Several hundred hosts, not a problem with SGE5.3 or SGE6.0, many users have
>>larger clusters.
>>
>>For a hugh amount of jobs, you may need a 64-bit machine so that
>>qmaster/scheduler can allocate more than 2GB of memory.
>>
>>And SGE doesn't use RSH to dispatch jobs to remote hosts, it is used when
>>there are interactive jobs. And you don't even need the normal RSH daemon
>>enabled on the hosts.
>>
>>Rayson
>>
>>
>>    
>>
>>>Need some help understanding if SGE 60u3/u4 can meet my needs.  I have
>>>used SGE on a small scale, roughly 100 queue instances with no more than
>>>500 jobs ever queued/running.  My question is can SGE support a couple
>>>hundred queue instances (400-500) with 150,000 -300,000 jobs
>>>queued/running?  The environment will be pumping through thousands of
>>>very fast execution jobs (5secs to 60 secs runs), yet literally, there
>>>will hundreds of thousands of them.
>>>      
>>>
The numbers are no problem. I have seen grids with more than 1000 queue 
instances or havening
more than 300k jobs in the system. Both were no problem, if the system 
is configured right and
you have appropriate hardware.

The only thing I would worry a bit about is the short run time of your 
jobs. Using a couple hundred
exec hosts with 200k jobs in the system will most likely reduce the 
utilization of your grid because
the scheduler needs to be able to handle the job numbers.

When you setup your grid, please have a look at the performance tuning 
how to.  It was not updated
yet and only covers the available parameters in 6.0u1 and earlier. I 
think, I should update it... :-)

Also take a look at the scheduler profiling. Based on my own tests, I 
would say that the job runtime
should be two times longer than the max scheduler runtime to achive okay 
utilization.


>>>Too, since SGE uses RSH to dispatch jobs on remote hosts, is there an
>>>issue with having only a single master pushing all these jobs e.g. will
>>>the single host run out of ports if there are 400-500 queue instances?
>>>
>>>Does anyone have experience with this or can give me some suggestions.
>>>
>>>Thanks.
>>>
>>>      
>>>
>>---------------------------------------------------------
>>Get your FREE E-mail account at http://www.eseenet.com !
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list