[GE users] SMP idea for containing multithreading
landman at scalableinformatics.com
Mon Nov 3 00:00:58 GMT 2008
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Chris Jewell wrote:
> Hi all,
> We're looking at a new cluster in my department for mini-HPC purposes.
> I'm interested in using GE as it provides a nice way for users to invoke
> interactive shells, whilst retaining load-balancing and control of
> oversubscription (in the way that SSI platforms, eg Kerrighed, don't).
> Although most of the users will be using software like R and Matlab
> which (to the best of my knowledge) are serial applications, I envisage
R/Parallel was released (recently) and it has MPI extensions. Matlab
runs in parallel, though there are license issues to deal with.
> an increase in the use of multithreaded software. My experience of GE
> is that it can work well for SMP applications where a job requires the
> maximum number of processors on a node, but for jobs with fewer
> /requested/ processors, threads can still spawn across an entire
> execution node. So, firstly, is there a standard way of limiting the
> number of processors that a user's job can use /on the execution node/?
> I'm suspecting no...?
Yes ... through the OS, and through the scheduler ... but the first
isn't as easy, and the second may not be respected by the application
> So, an idea I had was to get SGE to spawn a virtual host on the
> execution node that uses the requested number of processors. Then, the
> job would be sandboxed in to however many processors are assigned to the
> virtual host. However, I have never messed about with virtual hosts in
Hmmm ... ok
> an HPC environment, so I really don't know what the performance issues
> might be. I wonder if any of you, with greater knowledge on these
> things than myself, would care to comment?
Since you are looking at R and Matlab, I wouldn't expect them to be
heavy on IO (well it is possible ... if you are analyzing huge data sets
with lots of IO calls). So likely most of the virtualized code will be
running effectively natively. You could do this through Xen, VMware,
and others. Some people have done studies on this sort of usage.
Impact ranged between 5% at the low end to over 50% at the high end for
the tests we have seen (sadly, not published for a number of reasons).
More recent results may have been published which include additional
products. I would imagine we will hear someone touting VirtualBox
shortly, though I am not sure if this has been tested in this paradigm
yet. I am aware of Xen, VMware, and numerous others that have.
Regardless of the specific VM technology, what you want to do is
possible and people have been doing this. The question is whether or
not the impact will be substantial, and this is in part determined by
the product, the application mix, and what code paths the application
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users