[GE users] new user setup help two different domains
dag at sonsorol.org
Thu May 25 11:38:33 BST 2006
The network layout of the public machine(s) front-ending the 45
compute nodes hidden on a private network is an extremely popular
Adding your Apple machines is easy in theory (Grid Engine does mixed-
architecture clusters well) but in practice the real outcome depends
on your network and firewall setups and how Grid Engine is set up on
the linux cluster.
I think, however, that you will most likely end up wiping and
reinstalling Grid Engine on your Mac systems. Taking a functional
small computer farm and trying to turn it into exec hosts belonging
to a different system is a big change and it will likely be easier
just to do this from scratch by running the "install_execd" scripts
(assuming SGE Version 6.0 or higher, the install script is different
in 5.x) all over again. You might want to break off one node from
your apple cluster and just use it as a testbed -- if you can add one
apple system without much trouble you can the add the remaining 6.
Things you should check on and confirm:
(1) does the linux SGE master only accept grid communication over the
private linux network. Find out the TCP ports that SGE is configured
to use and make sure those ports are reachable on the network from
your apple systems.
(2) Can the apple nodes DNS-resolve the name of the linux SGE master
as Grid Engine believes it to be? There should be a file called
"act_qmaster" in $SGE_ROOT/$SGE_CELL/common/ -- the file contains the
name of the current grid engine master. This is the file that
sge_execd daemons read when trying to learn how to contact and
register with the master. This could end up being a "private"
network name for the Linux master. This is not a deal breaker though
as you can use the grid engine "sge_aliases" file or other tricks to
get your apple nodes to see the SGE master. This step (understanding
the machine name that Grid Engine uses, how it resolves and how it is
reachable via the network the Apple nodes are on) is probably the
biggest thing you need to research before trying an experiment or two.
(3) Usernames. When you login as "you" on your linux cluster and
submit a job for execution on the Apple systems, will that account
exist on the apple nodes? Otherwise you'll get a job "user does not
exist" failure when the job lands on the apple node.
Once you understand how the hostname/DNS issues are configured and
you are sure that your apple systems can reach the TCP ports required
on the master you can start testing things out.
I'm assuming that your apple systems are not going to NFS mount the
$SGE_ROOT (would be easier if this was possible) -- without shared
NFS there are going to have to be some config files and data copied/
rsynced over to the apple systems. Probably a good place to learn how
this is done is to search the gridengine.sunsource.net site and find
the "NFS Reduction Howto" which explains how to run SGE with varying
levels of shared filesystems.
So long story short, what you want is possible and commonly done but
the specifics depend on your local setup and you'll likely need to be
a comfortable SGE command line administrator to get things done in
the shortest amount of time.
On May 25, 2006, at 1:42 AM, Brett W Grant wrote:
> I have been using gridware for about a year now, but not
> administering it. Anyway, I have a network of computers that is
> administered by out IT dept. I am not sure of what you call it,
> but only the head node is visible on the network, the executions
> hosts are all hidden behind the one computer that users can log
> into. These computers are all linux boxes running RHE4. I think
> that I have about 45 of these machines.
> I have a small cluster of 7 macs running OSX that my group owns.
> Each computer is visible on the network. Due to the nature of the
> jobs that we run, unless I have a small group of jobs, I don't ever
> use these macs. Rather than just letting them sit, I would like to
> add them to my larger cluster. The IT department will not
> administer the macs, but they don't object if I add them.
> Here is my problem. I have no idea on how to add them. I have
> admin privileges on the macs, but not on the linux cluster. I am
> an sge administrator on both systems.
> I would assume that I need to shut down the grid that is currently
> running on the macs, but can I assume that I don't need to
> reinstall sge_execd?
> I went to the linux cluster qmon and tried to add an execution
> host, but it doesn't seem complete. For example, I know that I can
> only communicate with the macs using ssh, but I didn't where I
> could set that.
> Perhaps I can't even do what I want. Perhaps I am just in over my
> head, but it seems like it should be possible. Maybe I am just
> looking at the wrong sections of the manual. Any help would be
> Brett Grant
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users