[GE users] SMP idea for containing multithreading

Joe Landman landman at scalableinformatics.com
Mon Nov 3 00:00:58 GMT 2008

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Chris Jewell wrote:
> Hi all,
> We're looking at a new cluster in my department for mini-HPC purposes.
> I'm interested in using GE as it provides a nice way for users to invoke
> interactive shells, whilst retaining load-balancing and control of
> oversubscription (in the way that SSI platforms, eg Kerrighed, don't).
> Although most of the users will be using software like R and Matlab
> which (to the best of my knowledge) are serial applications, I envisage

Hi Chris

  R/Parallel was released (recently) and it has MPI extensions.  Matlab 
runs in parallel, though there are license issues to deal with.

> an increase in the use of multithreaded software.  My experience of GE
> is that it can work well for SMP applications where a job requires the
> maximum number of processors on a node, but for jobs with fewer
> /requested/ processors, threads can still spawn across an entire
> execution node.  So, firstly, is there a standard way of limiting the
> number of processors that a user's job can use /on the execution node/?
> I'm suspecting no...?

Yes ... through the OS, and through the scheduler ... but the first 
isn't as easy, and the second may not be respected by the application 

> So, an idea I had was to get SGE to spawn a virtual host on the
> execution node that uses the requested number of processors.  Then, the
> job would be sandboxed in to however many processors are assigned to the
> virtual host.  However, I have never messed about with virtual hosts in

Hmmm ... ok

> an HPC environment, so I really don't know what the performance issues
> might be.  I wonder if any of you, with greater knowledge on these
> things than myself, would care to comment?

Since you are looking at R and Matlab, I wouldn't expect them to be 
heavy on IO (well it is possible ... if you are analyzing huge data sets 
with lots of IO calls).  So likely most of the virtualized code will be 
running effectively natively.  You could do this through Xen, VMware, 
and others.  Some people have done studies on this sort of usage. 
Impact ranged between 5% at the low end to over 50% at the high end for 
the tests we have seen (sadly, not published for a number of reasons). 
More recent results may have been published which include additional 
products.  I would imagine we will hear someone touting VirtualBox 
shortly, though I am not sure if this has been tested in this paradigm 
yet.  I am aware of Xen, VMware, and numerous others that have.

Regardless of the specific VM technology, what you want to do is 
possible and people have been doing this.  The question is whether or 
not the impact will be substantial, and this is in part determined by 
the product, the application mix, and what code paths the application 
itself takes.


Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list