[GE users] SGE noobie

dangruhn Dan.Gruhn at groupw.com
Wed May 27 13:38:37 BST 2009


As another small point, IANA now has defined ports for sge_qmaster 
(6444) and sge_execd (6445). If there no particular reason you have used 
288 I suggest using the defined ports.

Dan

craffi wrote:
> Is there a technical or other reason why you have "/SGE6" and "/SGE6b" ?
>
> Ideally you need a consistent and shared path so that it is "/SGE6"  
> everywhere.
>
> - NFS export "/SGE6" on your master host
> - Mount machine1:/SGE6 to /SGE6 on all your other hosts
>
> Then your qmaster will write its hostname to /SGE6/default/common/ 
> act_qmaster. When you run ./install_execd  on machine2 from  /SGE6 it  
> will detect the presence of the cell directory ("default/") and will  
> automatically read in default/common/act_qmaster and register with the  
> qmaster.
>
> The connection refused error usually indicates a firewall problem or  
> other problem with multiple NICs on the master host but as you  
> mentioned this error does not matter since it's trying to contact the  
> wrong host anyway. There is chance that you have old or incorrect data  
> in /SGE6b and somehow the installer is picking up the wrong master  
> from there.
>
> Again, unless there is a technical, security or political reason for  
> doing things differently you really should have paths that are same  
> regardless of host. This means shared home directories for users and a  
> shared $SGE_ROOT at a consistent path.
>
> -Chris
>
>
>
>
> On May 26, 2009, at 5:17 PM, biostat wrote:
>
>   
>> I have installed the qmaster on my master host computer, machine1,  
>> under the directory /SGE6. I then NSFed that folder to machine 2.  
>> Under machine 2, I ran ./install_execd (I did not run the  
>> install_execd script from the NSFed directory, I ran it from a local  
>> directory /SGE6b). During the execd install, it asked for my root  
>> path, and I gave it the NSFed folder. However a few steps later, I  
>> ran into this problem:
>>
>> Checking hostname resolving
>> ---------------------------
>>
>> Cannot contact qmaster. The command failed:
>>
>>   ./bin/darwin-x86/qconf -sh
>>
>> The error message was:
>>
>>   error: commlib error: can't connect to service (Connection refused)
>> ERROR: unable to contact qmaster using port 288 on host  
>> "machine2.xxx.yyy.edu"
>>
>> You can fix the problem now or abort the installation  procedure.
>> The problem can be:
>>
>>   - the qmaster is not running
>>   - the qmaster host is down
>>   - an active firewall blocks your request
>>
>> Contact qmaster again (y/n) ('n' will abort) [y] >>
>>
>>
>> The qmaster *is* running on machine1, but it doesnt seem like it is  
>> looking at machine1. Its looking in machine2.
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199058
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199091
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>   

-- 
Dan Gruhn
Group W Inc.
8315 Lee Hwy, Suite 303
Fairfax, VA, 22031
PH: (703) 752-5831
FX: (703) 752-5851

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199196

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list