[GE users] Multiple Existing Clusters

Bradford, Matthew matthew.bradford at eds.com
Tue Nov 8 20:33:32 GMT 2005


Thanks for getting back to me.

For our test set up we are only looking at two clusters with 8 nodes
each, but in the real world we are looking at ~10 clusters with anything
from 32 to 128 nodes. 

With regards to the transfer queues, they will all be within the same
domain, but as they stand at the moment, the clusters in the real
environment all have duplicate file systems, rather than a common file
system. This does not mean that they have to have their own file system,
it just means that that is how they have been set up. In our test set
up, there will be a common file system.



-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: 03 November 2005 22:05
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Multiple Existing Clusters

Hi Mat,

Am 02.11.2005 um 15:24 schrieb Bradford, Matthew:

> Dear all,
> We currently have a set up that consists of several PC clusters, each 
> with its own master server and configuration which we have previously 
> had very little to do with. The clusters are normally running 
> parallel, mpi applications within their own cell. Each cluster is also

> set up so that the internal cluster nodes are hidden from the general 
> network, with only the master server visible to the network. Normally,

> a user remotely logs in to the master server and uses it as a 
> submission host. As is usual, each project submits their jobs to their

> own cluster, and there is no communication or integration of any of 
> the clusters. All clusters are operating within the same domain.
> We are now looking at how this can be improved. There are probably 
> several things I don't really understand and would appreciate any 
> advice that I could get.
> 1. What is the best way of integrating all the clusters so that a user

> can submit a job without needing to submit it to a specific cluster.
>         If we were starting from scratch then I would assume that the 
> simplest way would be to not have master server's running on each 
> independent cluster, but have a single, central master server, with 
> cluster queues and parallel environments set up that manage each 
> cluster. The user would then only need to submit their job to the 
> master server, stating that it should run within a parallel 
> environment, without needing to identify the specific queue in which 
> it runs. (As explained in http://gridengine.sunsource.net/
> servlets/ReadMsg?list=users&msgNo=13455 ). This type of set-up would 
> require that each node within the cluster is visible from the central 
> master server and therefore each node would require a separate network

> connection.
> 2. Is using TransferQueues a good way of integrating several clusters?
>         If we were not looking to modify the existing set-up of the 
> clusters a great deal, is the alternative to use TransferQueues. If I 
> understand this correctly, a local cluster can have additional queues 
> set-up as transfer queues for each remote cluster that should be made 
> available. A user would then log on to their own cluster, submit a 
> job, and it may be sent to any of the other clusters for which a 
> transfer queue exists. This mechanism would not require the internal 
> cluster nodes be made available to the general network, but would mean

> that each individual cluster would have to be administered separately.
> 3. Could we integrate clusters using the Globus-SGE integration.
>         I am not sure how this will function, but assume that a Globus

> component will act as the central submission point, and will make 
> submissions to any of the SGE controlled clusters. Do the clusters 
> feed back their resource level to the Globus component, allowing 
> Globus to decide which cluster the submitted job should be sent to? Is

> this really intended for global rather than campus grid set ups?
> 4. Is there an alternative way of integrating several existing 
> clusters into a single campus grid environment?
> I'm not sure if questions like this are suitable for this mailing 
> list, but I'm not sure where else to go. I have looked at several "How

> to" docs, but am still a bit unclear on the best way forward.
you ordered the choices as I would order them to try. The question to 1.
would be, whether you have managed switches and could this way have a
VPN for just all compute nodes, and don't have to worry where it is
located in the campus. For sure it's separated from the normal
workstations of the users or Internet, and not reachable from the
outside world. At one point it might get hard for the file server, so
the question: how many grids and nodes are we speaking about?

Also transfer queues are a good choice, as it's working without any
other software to be installed. But the question is the mentioned
restriction on the Howto page: is it possible to have the same namespace
and common filesystem?

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list