[GE users] sge v6.0u3 new installation issue with more than 1021 hosts.
macmccalla at hess.com
Thu Mar 10 14:42:57 GMT 2005
First my environment is all redhat EL WS or ES 3, on dual xeon's.
i am moving my production grid from sge 5.3p6 to sge 6.0u3 . the 5.3
installation is supporting about 900 hosts at this time.
the 6.0u3 system has been installed and running for a couple of weeks
now in test mode supporting the same 890 hosts
and seemed to be ok. I have been adding some new hosts that are being
installed as they become available to only
the 6.0u3 system. yesterday, when the number of hosts actually
connected by execd passed from 1021 to 1022,
i noticed that qmaster stopped responding on port 538 to any further
requests from additional execd's or commands (qstat,qhost
,etc). the ulimit for fd's is set at 4096 at qmaster startup (the info
message at qmaster startup says qmaster will use 4076 file
descriptors for communication). Has anyone else see this problem or
have a 6.0u3 installation with more hosts?
thanks in advance,
More information about the gridengine-users