[GE users] CPU time, share-based scheduling, and parallel jobs
gladden at chem.washington.edu
Fri Jun 5 01:30:13 BST 2009
I have a basic question about how share-based scheduling works. My
understanding is that the system resource which the scheduler attempts
to share among users and/or projects is a calculated combination of CPU
time, memory use, and I/O use where the system manager can set the
weighting if the three underlying resources.
Our cluster is mostly homogeneous, will offer only one job slot per CPU
core, and be dominated by parallel jobs. In this environment, the
resource we really want to share among users/projects is CPU core wall
clock time. In other words, if a core has been allocated by a job, that
core is unavailable to other users and is being "consumed" for the
duration of the job - irrespective as the whether the job actually keeps
the core busy. In most cases this distinction is probably a quibble,
since parallel jobs are typically compute bound and really do keep their
allocated cores busy.
However, it is not clear to me that we can guarantee that all parallel
applications will be "tightly integrated." My understanding is that, in
the absence of tight integration, SGE will not be able to keep track of
all of the CPU time used by a job if the job is spread across multiple
nodes. So I am wondering whether, in an environment that uses
share-based scheduling, "tight integration" of parallel jobs becomes
critical to making the scheduler work correctly. Is this a problem in
practice? Or is there a way to tell the scheduler to not bother with
tracking actual job CPU use and just track core wall clock time?
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users