[GE users] Grid Installation for firs time

Walter Faleiro walter at marfic.com
Thu Mar 10 15:25:37 GMT 2005

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hello Dan,
I have updated the services file on the host partition as well. do i need to have common $SGE_ROOT. I tried to have a $SGE_ROOT as a NFS partition mounted on the execution host. But it gave me premission errors while installing.
My OS is linux Redhat Version 9. And processor is Intel PIV 2.4Ghz.

-----Original Message-----
From: Dan Gruhn [mailto:Dan.Gruhn at Group-W-Inc.com]
Sent: Thursday, March 10, 2005 7:18 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Grid Installation for firs time


Did you update the services file on the "other host"?  That is, the machine that is not the queue master?

Also, what is your O/S and processor, etc.


On Thu, 2005-03-10 at 10:12, Walter Faleiro wrote: 

Hello Grid Users,
I am trying to  install Grid 6.0u3 on two linux machines. I have followed the following procedure.
Untar the grid-common and grid-bin files.
add sge_qmaster 536/tcp
    sge_execd 537/tcp 
to the services file.
 install_qmaster and install_execd scripts.
My machine is configured for queue master, administrative host, execute host and submit hosts.
I need some help in installing the install_execd on the other machine. when i run the script it exits saying quemaster intsallation is not done. Do i need to install the quemaster on the execution hosts as well. I followed the documentation on the sun docs, and nowhere it mentions installing the Queuemaster on the execution hosts.
Thanks Walter. 

[Walter Faleiro] 
From: McCalla, Mac [mailto:macmccalla at hess.com]
Sent: Thursday, March 10, 2005 6:43 AM
To: users at gridengine.sunsource.net
Subject: [GE users] sge v6.0u3 new installation issue with more than 1021 hosts.

Hi folks,

First my environment is all redhat EL WS or ES 3, on dual xeon's.
i am moving my production grid from sge 5.3p6 to sge 6.0u3 .  the 5.3 installation is supporting about 900 hosts at this time.

the 6.0u3 system has been installed and running for a couple of weeks now in test mode supporting the same 890 hosts
and seemed to be ok.  I have been adding some new hosts that are being installed as they become available to only
the 6.0u3 system.  yesterday, when the number of hosts actually connected by execd passed from 1021 to 1022,
i noticed that qmaster stopped responding on port 538 to any further requests from additional execd's or commands (qstat,qhost

,etc).   the ulimit for fd's is set at 4096 at qmaster startup (the info message at qmaster startup says qmaster will use 4076 file

descriptors for communication).  Has anyone else see this problem or have a 6.0u3 installation with more hosts?  

thanks in advance,
Mac McCalla

More information about the gridengine-users mailing list