[GE users] CPU time, share-based scheduling, and parallel jobs

jagladden gladden at chem.washington.edu
Fri Jun 5 01:30:13 BST 2009

I have a basic question about how share-based scheduling works.  My 
understanding is that the system resource which the scheduler attempts 
to share among users and/or projects is a calculated combination of CPU 
time, memory use, and I/O use where the system manager can set the 
weighting if the three underlying resources.

Our cluster is mostly homogeneous, will offer only one job slot per CPU 
core, and be dominated by parallel jobs.  In this environment, the 
resource we really want to share among users/projects is CPU core wall 
clock time.  In other words, if a core has been allocated by a job, that 
core is unavailable to other users and is being "consumed" for the 
duration of the job - irrespective as the whether the job actually keeps 
the core busy.  In most cases this distinction is probably a quibble, 
since parallel jobs are typically compute bound and really do keep their 
allocated cores busy.

However, it is not clear to me that we can guarantee that all parallel 
applications will be "tightly integrated."  My understanding is that, in 
the absence of tight integration, SGE will not be able to keep track of 
all of the CPU time used by a job if the job is spread across multiple 
nodes.  So I am wondering whether, in an environment that uses 
share-based scheduling, "tight integration" of parallel jobs becomes 
critical to making the scheduler work correctly.  Is this a problem in 
practice?  Or is there a way to tell the scheduler to not bother with 
tracking actual job CPU use and just track core wall clock time?

James Gladden


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list