[GE users] limitation of the max slots or nodes

rayson rayrayson at gmail.com
Mon Nov 17 16:54:57 GMT 2008


On 11/17/08, holy <holy8086 at gmail.com> wrote:
> Hi All.
> I will use SEGE6.2 in large-scale grid.

How large? :D

SGE runs on "Ranger", which has close to 4000 nodes.

http://www.tacc.utexas.edu/resources/hpcsystems/#ranger

> When GE starts, some warning messages about file descriptor are found to the
> spool/qmaster/messages file.

That's a limit of Linux's implementation of system call select().

If you use a 64-bit Solaris machine as the qmaster, your limit would
be up to around 60,000 hosts.

> Is there some limitation of the maximum number of slots or number of nodes
> to be able to operate GE?

Not number of slots, but the number of nodes.

(And FYI: Ranger has 62,976 processor cores, but only has 3,936 nodes.)

> When the number of nodes or slots in a CELL are increased, would the number
> of file descriptor(and so on) be consumed?

When another node is added to the cluster, the qmaster needs another
file descriptor as a channel to talk to the execution daemon on that
node. But on Linux, the system call select() by default can't handle
over 1024 file descriptors.

Noe that the SGE build machine was patched to increase the limit to
over 8000, but still if you have over 8000 nodes, then you will either
need to patch your build machine and build your own qmaster on that
machine, or switch to 64-bit Solaris as your qmaster machine.

Rayson


>
> Thanks,
> holy
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88895

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list