[GE users] Large Rocks/Oscar clusters... (Linux select(2) bug)

Rayson Ho rayrayson at gmail.com
Thu Nov 17 17:21:20 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Thanks Christian for the reply...

With select(2), when the cluster is larger than around 1000 nodes, we
hit the Linux header file bug (we fixed this in later versions of 6.0
updates by patching the system header file). With poll(2), we should
be able to get a larger number of connections, and thus support more
nodes under a single master.

However, what if the number of nodes is larger than the number of file
descriptors available?? If we keep all the connections alive, then
eventually qmaster will run out of file descriptors

So using poll does not fix the fundamental problem...

Rayson



On 11/16/05, christian reissmann <Christian.Reissmann at sun.com> wrote:
> We are already in the QA phase for the 6.0s2 release, so for
> 6.0s2 it's not an option. I try to test poll() implementation for
> the next release. We also have to be sure that any dependence from
> FD_SETSIZE is checked/reviewed in the entire source code.
>
> It's also a problem to test the fix, because I don't have access
> to clusters with more than 1024 hosts.
>
> Best regards,
>
> Christian

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list