[GE users] new user setup help two different domains

Chris Dagdigian dag at sonsorol.org
Thu May 25 11:38:33 BST 2006


Hi Brett,

The network layout of the public machine(s) front-ending the 45  
compute nodes hidden on a private network is an extremely popular  
configuration.

Adding your Apple machines is easy in theory (Grid Engine does mixed- 
architecture clusters well) but in practice the real outcome depends  
on your network and firewall setups and how Grid Engine is set up on  
the linux cluster.

I think, however, that you will most likely end up wiping and  
reinstalling Grid Engine on your Mac systems. Taking a functional  
small computer farm and trying to turn it into exec hosts belonging  
to a different system is a big change and it will likely be easier  
just to do this from scratch by running the "install_execd" scripts  
(assuming SGE Version 6.0 or higher, the install script is different  
in 5.x) all over again.  You might want to break off one node from  
your apple cluster and just use it as a testbed -- if you can add one  
apple system without much trouble you can the add the remaining 6.

Things you should check on and confirm:

(1) does the linux SGE master only accept grid communication over the  
private linux network. Find out the TCP ports that SGE is configured  
to use and make sure those ports are reachable on the network from  
your apple systems.

(2) Can the apple nodes DNS-resolve the name of the linux SGE master  
as Grid Engine believes it to be? There should be a file called  
"act_qmaster" in $SGE_ROOT/$SGE_CELL/common/ -- the file contains the  
name of the current grid engine master. This is the file that  
sge_execd daemons read when trying to learn how to contact and  
register with the master.  This could end up being a "private"  
network name for the Linux master.  This is not a deal breaker though  
as you can use the grid engine "sge_aliases" file or other tricks to  
get your apple nodes to see the SGE master.  This step (understanding  
the machine name that Grid Engine uses, how it resolves and how it is  
reachable via the network the Apple nodes are on) is probably the  
biggest thing you need to research before trying an experiment or two.

(3) Usernames. When you login as "you" on your linux cluster and  
submit a job for execution on the Apple systems,  will that account  
exist on the apple nodes? Otherwise you'll get a job "user does not  
exist" failure when the job lands on the apple node.

Once you understand how the hostname/DNS issues are configured and  
you are sure that your apple systems can reach the TCP ports required  
on the master you can start testing things out.

I'm assuming that your apple systems are not going to NFS mount the  
$SGE_ROOT (would be easier if this was possible) -- without shared  
NFS there are going to have to be some config files and data copied/ 
rsynced over to the apple systems. Probably a good place to learn how  
this is done is to search the gridengine.sunsource.net site and find  
the "NFS Reduction Howto" which explains how to run SGE with varying  
levels of shared filesystems.

So long story short, what you want is possible and commonly done but  
the specifics depend on your local setup and you'll likely need to be  
a comfortable SGE command line administrator to get things done in  
the shortest amount of time.

-Chris



On May 25, 2006, at 1:42 AM, Brett W Grant wrote:

> I have been using gridware for about a year now, but not  
> administering it.  Anyway, I have a network of computers that is  
> administered by out IT dept.  I am not sure of what you call it,  
> but only the head node is visible on the network, the executions  
> hosts are all hidden behind the one computer that users can log  
> into.  These computers are all linux boxes running RHE4.  I think  
> that I have about 45 of these machines.
>
> I have a small cluster of 7 macs running OSX that my group owns.   
> Each computer is visible on the network.  Due to the nature of the  
> jobs that we run, unless I have a small group of jobs, I don't ever  
> use these macs.  Rather than just letting them sit, I would like to  
> add them to my larger cluster.  The IT department will not  
> administer the macs, but they don't object if I add them.
>
> Here is my problem.  I have no idea on how to add them.  I have  
> admin privileges on the macs, but not on the linux cluster.  I am  
> an sge administrator on both systems.
>
> I would assume that I need to shut down the grid that is currently  
> running on the macs, but can I assume that I don't need to  
> reinstall sge_execd?
>
> I went to the linux cluster qmon and tried to add an execution  
> host, but it doesn't seem complete.  For example, I know that I can  
> only communicate with the macs using ssh, but I didn't where I  
> could set that.
>
> Perhaps I can't even do what I want.  Perhaps I am just in over my  
> head, but it seems like it should be possible.  Maybe I am just  
> looking at the wrong sections of the manual.  Any help would be  
> appreciated.
>
> Thanks,
> Brett Grant
> ---------------------------------------------------------------------  
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net  
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list