[GE users] Using cycles from a 2nd SGE cluster

craffi dag at sonsorol.org
Fri Jan 8 12:56:42 GMT 2010


I see this a lot in my consulting work - the "multi cluster" request 
usually comes from top level management who've been reading far too much 
about grids and clouds. They just think it would be cool to "unify" the 
various HPC systems in the organization and blindly issue the order to 
look into it.

I'll give you the cynical industry answer ...

Yes it's technically possible via these methods:

(1) transfer queues
(2) suicidal rampage down the globus/meta-scheduler route
(3) Sun SDM

... but I've personally never seen this really ever be successful in a 
commercial/industry production computing environment that is not 
academic in nature or funded by defense/sovereign nation dollars.

The only working systems I've seen have been at academic sites with 
*tons* of sysadmin resources or toy/demo/playground setups purpose built 
for demonstration purposes.

So the technical answer in my option is "yes" but the practical answer 
in real world environments is usually "no". It is 100x harder when the 
two systems are geographically separated or have separate filesystem and 
UID/GID namespaces as well. Just an utter nightmare and the level of 
abstraction and wrapping needed to get anything done removes any 
efficiencies gained.

This is not the answer you want to hear but I'd recommend tackling the 
political problems first to see if they can be addressed.

In the real world the most practical solution I've seen is that the two 
groups agree to keep operating separate systems but when the next 
upgrade/refresh period rolls around they get together, do some serious 
planning and then roll out a new single unified HPC system that everyone 
is happy to share.

In other projects the clusters have been relocated or rearchitected to 
either share the same datacenter or at least the same identity server 
and subnets so that future collaboration is easier.

 From an IT or management perspective I also see a lot of cases where 
central IT will build a big new cluster from scratch in order to tease 
or lure the standalone cluster crowd onto their shared system. This can 
be a multi-year task but the end result is that if you build a better 
centralized resource and make it available you'll often be able to 
consolidate and retire the smaller systems without anger or political 
hassles.

Just my $.02

-Chris





rhierlmeier wrote:
> My sys admin has been trying to configure two independent, linux clusters with static SGE pools, such that when the first cluster batch queue fills, additional jobs will fall over to a low priority queue in the second cluster. Each cluster has its own master node, and it would be a political non-starter to change that. So far, my admin has not succeeded.
>
> Is his configuration with static pools workable?
> If so, we would welcome some guidance in configuring our SGE deployment to do this.
>
> We are beginning to wonder whether this is undoable with static pools, and need to switch to a dynamic pool.
>
> Input would be most welcome.  Thanks!  -Joe
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=237363

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list