[GE users] Using cycles from a 2nd SGE cluster
dag at sonsorol.org
Fri Jan 8 12:56:42 GMT 2010
I see this a lot in my consulting work - the "multi cluster" request
usually comes from top level management who've been reading far too much
about grids and clouds. They just think it would be cool to "unify" the
various HPC systems in the organization and blindly issue the order to
look into it.
I'll give you the cynical industry answer ...
Yes it's technically possible via these methods:
(1) transfer queues
(2) suicidal rampage down the globus/meta-scheduler route
(3) Sun SDM
... but I've personally never seen this really ever be successful in a
commercial/industry production computing environment that is not
academic in nature or funded by defense/sovereign nation dollars.
The only working systems I've seen have been at academic sites with
*tons* of sysadmin resources or toy/demo/playground setups purpose built
for demonstration purposes.
So the technical answer in my option is "yes" but the practical answer
in real world environments is usually "no". It is 100x harder when the
two systems are geographically separated or have separate filesystem and
UID/GID namespaces as well. Just an utter nightmare and the level of
abstraction and wrapping needed to get anything done removes any
This is not the answer you want to hear but I'd recommend tackling the
political problems first to see if they can be addressed.
In the real world the most practical solution I've seen is that the two
groups agree to keep operating separate systems but when the next
upgrade/refresh period rolls around they get together, do some serious
planning and then roll out a new single unified HPC system that everyone
is happy to share.
In other projects the clusters have been relocated or rearchitected to
either share the same datacenter or at least the same identity server
and subnets so that future collaboration is easier.
From an IT or management perspective I also see a lot of cases where
central IT will build a big new cluster from scratch in order to tease
or lure the standalone cluster crowd onto their shared system. This can
be a multi-year task but the end result is that if you build a better
centralized resource and make it available you'll often be able to
consolidate and retire the smaller systems without anger or political
Just my $.02
> My sys admin has been trying to configure two independent, linux clusters with static SGE pools, such that when the first cluster batch queue fills, additional jobs will fall over to a low priority queue in the second cluster. Each cluster has its own master node, and it would be a political non-starter to change that. So far, my admin has not succeeded.
> Is his configuration with static pools workable?
> If so, we would welcome some guidance in configuring our SGE deployment to do this.
> We are beginning to wonder whether this is undoable with static pools, and need to switch to a dynamic pool.
> Input would be most welcome. Thanks! -Joe
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users