[GE users] Grid Installation for firs time

Dan Gruhn Dan.Gruhn at Group-W-Inc.com
Thu Mar 10 15:17:38 GMT 2005


Walter,

Did you update the services file on the "other host"?  That is, the
machine that is not the queue master?

Also, what is your O/S and processor, etc.

Dan

On Thu, 2005-03-10 at 10:12, Walter Faleiro wrote:

> Hello Grid Users,
> I am trying to  install Grid 6.0u3 on two linux machines. I have
> followed the following procedure.
>  
> Untar the grid-common and grid-bin files.
>  
> add sge_qmaster 536/tcp
>     sge_execd 537/tcp 
>  
> to the services file.
>  
>  install_qmaster and install_execd scripts.
>  
> My machine is configured for queue master, administrative host,
> execute host and submit hosts.
>  
> I need some help in installing the install_execd on the other machine.
> when i run the script it exits saying quemaster intsallation is not
> done. Do i need to install the quemaster on the execution hosts as
> well. I followed the documentation on the sun docs, and nowhere it
> mentions installing the Queuemaster on the execution hosts.
>  
>  
> Thanks Walter.
> 
>         -----Original
>         [Walter Faleiro] 
>           Message-----
>         From: McCalla, Mac [mailto:macmccalla at hess.com]
>         Sent: Thursday, March 10, 2005 6:43 AM
>         To: users at gridengine.sunsource.net
>         Subject: [GE users] sge v6.0u3 new installation issue with
>         more than 1021 hosts.
>         
>         
>         
>         
>         Hi folks,
>         
>         First my environment is all redhat EL WS or ES 3, on dual
>         xeon's.
>         i am moving my production grid from sge 5.3p6 to sge 6.0u3 . 
>         the 5.3 installation is supporting about 900 hosts at this
>         time.
>         
>         the 6.0u3 system has been installed and running for a couple
>         of weeks now in test mode supporting the same 890 hosts
>         and seemed to be ok.  I have been adding some new hosts that
>         are being installed as they become available to only
>         the 6.0u3 system.  yesterday, when the number of hosts
>         actually connected by execd passed from 1021 to 1022,
>         i noticed that qmaster stopped responding on port 538 to any
>         further requests from additional execd's or commands
>         (qstat,qhost
>         
>         ,etc).   the ulimit for fd's is set at 4096 at qmaster startup
>         (the info message at qmaster startup says qmaster will use
>         4076 file
>         
>         descriptors for communication).  Has anyone else see this
>         problem or have a 6.0u3 installation with more hosts?  
>         
>         thanks in advance,
>         Mac McCalla
>         



More information about the gridengine-users mailing list