[GE users] Multiple Existing Clusters

Bradford, Matthew matthew.bradford at eds.com
Wed Nov 2 14:24:20 GMT 2005


Dear all,

We currently have a set up that consists of several PC clusters, each
with its own master server and configuration which we have previously
had very little to do with. The clusters are normally running parallel,
mpi applications within their own cell. Each cluster is also set up so
that the internal cluster nodes are hidden from the general network,
with only the master server visible to the network. Normally, a user
remotely logs in to the master server and uses it as a submission host.
As is usual, each project submits their jobs to their own cluster, and
there is no communication or integration of any of the clusters. All
clusters are operating within the same domain.

We are now looking at how this can be improved. There are probably
several things I don't really understand and would appreciate any advice
that I could get.

1. What is the best way of integrating all the clusters so that a user
can submit a job without needing to submit it to a specific cluster. 
	
	If we were starting from scratch then I would assume that the
simplest way would be to not have master server's running on each
independent cluster, but have a single, central master server, with
cluster queues and parallel environments set up that manage each
cluster. The user would then only need to submit their job to the master
server, stating that it should run within a parallel environment,
without needing to identify the specific queue in which it runs. (As
explained in
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=13455
). This type of set-up would require that each node within the cluster
is visible from the central master server and therefore each node would
require a separate network connection.

2. Is using TransferQueues a good way of integrating several clusters?

	If we were not looking to modify the existing set-up of the
clusters a great deal, is the alternative to use TransferQueues. If I
understand this correctly, a local cluster can have additional queues
set-up as transfer queues for each remote cluster that should be made
available. A user would then log on to their own cluster, submit a job,
and it may be sent to any of the other clusters for which a transfer
queue exists. This mechanism would not require the internal cluster
nodes be made available to the general network, but would mean that each
individual cluster would have to be administered separately.

3. Could we integrate clusters using the Globus-SGE integration. 

	I am not sure how this will function, but assume that a Globus
component will act as the central submission point, and will make
submissions to any of the SGE controlled clusters. Do the clusters feed
back their resource level to the Globus component, allowing Globus to
decide which cluster the submitted job should be sent to? Is this really
intended for global rather than campus grid set ups?

4. Is there an alternative way of integrating several existing clusters
into a single campus grid environment?

I'm not sure if questions like this are suitable for this mailing list,
but I'm not sure where else to go. I have looked at several "How to"
docs, but am still a bit unclear on the best way forward.

Any help would be greatly appreciated.

Cheers,

Mat





More information about the gridengine-users mailing list