[GE users] How Many Resources Is Too Many?

templedf dan.templeton at sun.com
Fri May 8 21:05:41 BST 2009

I'm working on a deep integration between Hadoop and SGE, and that 
requires SGE to be able to schedule against the HDFS data.  The most 
effective way to do that that I have come up with is to model the HDFS 
data blocks as boolean resources reported by the execd's.  Effective, 
but not efficient.  The problem is that this approach will result in at 
least one such resource for every file in the HDFS, more for large 
files.  For a large file system, that could mean 1000's, maybe 10's (or 
even 100's) of thousands, of resources, with each host being assigned 
100's or 1000's.  Based on previous customer experiences, I'd say that's 
a really bad idea, but I thought I'd check to see what experience others 
have had with massive numbers of resources.  Anyone want to share?  
Anyone (Roland) want to suggest what the practical upper bound on number 
of resources should be?

(Anyone want to suggest an alternative approach?  I have plans B through 
E, but I'm certainly open to input.)



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list