F.1 Chapter
Quick Quiz
.1:
Come on now!!!
Parallel programming has been known to be exceedingly
hard for many decades.
You seem to be hinting that it is not so hard.
What sort of game are you playing?
Answer:
If you really believe that parallel programming is exceedingly
hard, then you should have a ready answer to the question
``Why is parallel programming hard?''
One could list any number of reasons, ranging from deadlocks to
race conditions to testing coverage, but the real answer is that
it is not really all that hard.
After all, if parallel programming was really so horribly difficult,
how could a large number of open-source projects, ranging from Apache
to MySQL to the Linux kernel, have managed to master it?
A better question might be: ''Why is parallel programming perceived to be so difficult?''
To see the answer, let's go back to the year 1991.
Paul McKenney was walking across the parking lot to Sequent's
benchmarking center carrying six dual-80486 Sequent Symmetry CPU
boards, when he suddenly realized that he was carrying several
times the price of the house he had just purchased.F.1 This high cost of parallel systems meant that
parallel programming was restricted to a privileged few who
worked for an employer who either manufactured or could afford to
purchase machines costing upwards of $100,000 -- in 1991 dollars US.
In contrast, in 2006, Paul finds himself typing these words on a
dual-core x86 laptop.
Unlike the dual-80486 CPU boards, this laptop also contains
2GB of main memory, a 60GB disk drive, a display, Ethernet,
USB ports, wireless, and Bluetooth.
And the laptop is more than an order of magnitude cheaper than
even one of those dual-80486 CPU boards, even before taking inflation
into account.
Parallel systems have truly arrived.
They are no longer the sole domain of a privileged few, but something
available to almost everyone.
The earlier restricted availability of parallel hardware is
the real reason that parallel programming is considered
so difficult.
After all, it is quite difficult to learn to program even the simplest
machine if you have no access to it.
Since the age of rare and expensive parallel machines is for the most
part behind us, the age during which
parallel programming is perceived to be mind-crushingly difficult is
coming to a close.F.2
Quick Quiz
.2:
How could parallel programming ever be as easy
as sequential programming?
Answer:
It depends on the programming environment.
SQL [Int92] is an underappreciated success
story, as it permits programmers who know nothing about parallelism
to keep a large parallel system productively busy.
We can expect more variations on this theme as parallel
computers continue to become cheaper and more readily available.
For example, one possible contender in the scientific and
technical computing arena is MATLAB*P,
which is an attempt to automatically parallelize common
matrix operations.
Finally, on Linux and UNIX systems, consider the following
shell command:
get_input | grep "interesting" | sort
This shell pipeline runs the get_input, grep,
and sort processes in parallel.
There, that wasn't so hard, now was it?
Quick Quiz
.3:
Oh, really???
What about correctness, maintainability, robustness, and so on?
Answer:
These are important goals, but they are just as important for
sequential programs as they are for parallel programs.
Therefore, important though they are, they do not belong on
a list specific to parallel programming.
Quick Quiz
.4:
And if correctness, maintainability, and robustness don't
make the list, why do productivity and generality?
Answer:
Given that parallel programming is perceived to be much harder
than is sequential programming, productivity is tantamount and
therefore must not be omitted.
Furthermore, high-productivity parallel-programming environments
such as SQL have been special purpose, hence generality must
also be added to the list.
Quick Quiz
.5:
Given that parallel programs are much harder to prove
correct than are sequential programs, again, shouldn't
correctness really be on the list?
Answer:
From an engineering standpoint, the difficulty in proving
correctness, either formally or informally, would be important
insofar as it impacts the primary goal of productivity.
So, in cases where correctness proofs are important, they
are subsumed under the ``productivity'' rubric.
Quick Quiz
.6:
What about just having fun?
Answer:
Having fun is important as well, but, unless you are a hobbyist,
would not normally be a primary goal.
On the other hand, if you are a hobbyist, go wild!
Quick Quiz
.7:
Are there no cases where parallel programming is about something
other than performance?
Answer:
There are certainly cases where the problem to be solved is
inherently parallel, for example, Monte Carlo methods and
some numerical computations.
Even in these cases, however, there will be some amount of
extra work managing the parallelism.
Quick Quiz
.8:
Why all this prattling on about non-technical issues???
And not just any non-technical issue, but productivity
of all things?
Who cares?
Answer:
If you are a pure hobbyist, perhaps you don't need to care.
But even pure hobbyists will often care about how much they
can get done, and how quickly.
After all, the most popular hobbyist tools are usually those
that are the best suited for the job, and an important part of
the definition of ``best suited'' involves productivity.
And if someone is paying you to write parallel code, they will
very likely care deeply about your productivity.
And if the person paying you cares about something, you would
be most wise to pay at least some attention to it!
Besides, if you really didn't care about productivity,
you would be doing it by hand rather than using a computer!
Quick Quiz
.9:
Given how cheap parallel hardware has become, how can anyone
afford to pay people to program it?
Answer:
There are a number of answers to this question:
- Given a large computational cluster of parallel machines,
the aggregate cost of the cluster can easily justify
substantial developer effort, because the development
cost can be spread over the large number of machines.
- Popular software that is run by tens of millions of users
can easily justify substantial developer effort,
as the cost of this development can be spread over the tens
of millions of users.
Note that this includes things like kernels and system
libraries.
- If the low-cost parallel machine is controlling the operation
of a valuable piece of equipment, then the cost of this
piece of equipment might easily justify substantial
developer effort.
- If the software for the low-cost parallel produces an
extremely valuable result (e.g., mineral exploration),
then the valuable result might again justify substantial
developer cost.
- Safety-critical systems protect lives, which can clearly
justify very large developer effort.
- Hobbyists and researchers might seek knowledge, experience,
fun, or glory rather than mere money.
So it is not the case that the decreasing cost of hardware renders
software worthless, but rather that it is no longer possible to
``hide'' the cost of software development within the cost of
the hardware, at least not unless there are extremely large
quantities of hardware.
Quick Quiz
.10:
This is a ridiculously unachievable ideal!
Why not focus on something that is achievable in practice?
Answer:
This is eminently achievable.
The cellphone is a computer that can be used to make phone
calls and to send and receive text messages with little or
no programming or configuration on the part of the end user.
This might seem to be a trivial example at first glance,
but if you consider it carefully you will see that it is
both simple and profound.
When we are willing to sacrifice generality, we can achieve
truly astounding increases in productivity.
Those who cling to generality will therefore fail to set
the productivity bar high enough to succeed in production
environments.
Quick Quiz
.11:
What other bottlenecks might prevent additional CPUs from
providing additional performance?
Answer:
There are any number of potential bottlenecks:
- Main memory. If a single thread consumes all available
memory, additional threads will simply page themselves
silly.
- Cache. If a single thread's cache footprint completely
fills any shared CPU cache(s), then adding more threads
will simply thrash the affected caches.
- Memory bandwidth. If a single thread consumes all available
memory bandwidth, additional threads will simply
result in additional queuing on the system interconnect.
- I/O bandwidth. If a single thread is I/O bound,
adding more threads will simply result in them all
waiting in line for the affected I/O resource.
Specific hardware systems may have any number of additional
bottlenecks.
Quick Quiz
.12:
What besides CPU cache capacity might require limiting the
number of concurrent threads?
Answer:
There are any number of potential limits on the number of
threads:
- Main memory. Each thread consumes some memory
(for its stack if nothing else), so that excessive
numbers of threads can exhaust memory, resulting
in excessive paging or memory-allocation failures.
- I/O bandwidth. If each thread initiates a given
amount of mass-storage I/O or networking traffic,
excessive numbers of threads can result in excessive
I/O queuing delays, again degrading performance.
Some networking protocols may be subject to timeouts
or other failures if there are so many threads that
networking events cannot be responded to in a timely
fashion.
- Synchronization overhead.
For many synchronization protocols, excessive numbers
of threads can result in excessive spinning, blocking,
or rollbacks, thus degrading performance.
Specific applications and platforms may have any number of additional
limiting factors.
Quick Quiz
.13:
Are there any other obstacles to parallel programming?
Answer:
There are a great many other potential obstacles to parallel
programming.
Here are a few of them:
- The only known algorithms for a given project might
be inherently sequential in nature.
In this case, either avoid parallel programming
(there being no law saying that your project has
to run in parallel) or invent a new parallel algorithm.
- The project allows binary-only plugins that share the same
address space, such that no one developer has access to
all of the source code for the project.
Because many parallel bugs, including deadlocks, are
global in nature, such binary-only plugins pose a severe
challenge to current software development methodologies.
This might well change, but for the time being, all
developers of parallel code sharing a given address space
need to be able to see all of the code running in
that address space.
- The project contains heavily used APIs that were designed
without regard to parallelism.
Some of the more ornate features of the System V
message-queue API form a case in point.
Of course, if your project has been around for a few
decades, and if its developers did not have access to
parallel hardware, your project undoubtedly has at least
its share of such APIs.
- The project was implemented without regard to parallelism.
Given that there are a great many techniques that work
extremely well in a sequential environment, but that
fail miserably in parallel environments, if your project
ran only on sequential hardware for most of its lifetime,
then your project undoubtably has at least its share of
parallel-unfriendly code.
- The project was implemented without regard to good
software-development practice.
The cruel truth is that shared-memory parallel
environments are often much less forgiving of sloppy
development practices than are sequential environments.
You may be well-served to clean up the existing design
and code prior to attempting parallelization.
- The people who originally did the development on your
project have since moved on, and the people remaining,
while well able to maintain it or add small features,
are unable to make ``big animal'' changes.
In this case, unless you can work out a very simple
way to parallelize your project, you will probably
be best off leaving it sequential.
That said, there are a number of simple approaches that
you might use
to parallelize your project, including running multiple
instances of it, using a parallel implementation of
some heavily used library function, or making use of
some other parallel project, such as a database.
One can argue that many of these obstacles are non-technical
in nature, but that does not make them any less real.
In short, parallelization can be a large and complex effort.
As with any large and complex effort, it makes sense to
do your homework beforehand.
Quick Quiz
.14:
Where are the answers to the Quick Quizzes found?
Answer:
In Appendix
starting on
page
.
Hey, I thought I owed you an easy one!
Quick Quiz
.15:
Some of the Quick Quiz questions seem to be from the viewpoint
of the reader rather than the author.
Is that really the intent?
Answer:
Indeed it is!
Many are modeled after Paul--just ask anyone who has had the
misfortune of being assigned to teach him.
Others are quite similar to actual questions that have been asked
during conference presentations and lectures covering the
material in this book.
Still others are from the viewpoint of the author.
Quick Quiz
.16:
These Quick Quizzes just are not my cup of tea.
What do you recommend?
Answer:
There are a number of alternatives available to you:
- Just ignore the Quick Quizzes and read the rest of
the book.
You might miss out on the interesting material in
some of the Quick Quizzes, but the rest of the book
has lots of good material as well.
- If you prefer a more academic and rigorous treatment of
parallel programming,
you might like Herlihy's and Shavit's
textbook [HS08].
This book starts with an interesting combination
of low-level primitives at high levels of abstraction
from the hardware, and works its way through locking
and simple data structures including lists, queues,
hash tables, and counters, culminating with transactional
memory.
- If you would like an academic treatment of parallel
programming that keeps to a more pragmatic viewpoint,
you might be interested in the concurrency chapter from Scott's
textbook [Sco06]
on programming languages.
- If you are interested in an object-oriented patternist
treatment of parallel programming focussing on C++,
you might try Volumes 2 and 4 of Schmidt's POSA
series [SSRB00,BHS07].
Volume 4 in particular has some interesting chapters
applying this work to a warehouse application.
The realism of this example is attested to by
the section entitled ``Partitioning the Big Ball of Mud'',
wherein the problems inherent in parallelism often
take a back seat to the problems inherent in getting
one's head around a real-world application.
- If your primary focus is scientific and technical computing,
and you prefer a patternist approach,
you might try Mattson et al.'s
textbook [MSM05].
It covers Java, C/C++, OpenMP, and MPI.
Its patterns are admirably focused first on design,
then on implementation.
- If you are interested in POSIX Threads, you might take
a look at David R. Butenhof's book [But97].
- If you are interested in C++, but in a Windows environment,
you might try Herb Sutter's ``Effective Concurrency''
series in
Dr. Dobbs Journal [Sut08].
This series does a reasonable job of presenting a
commonsense approach to parallelism.
- If you want to try out Intel Threading Building Blocks,
then perhaps James Reinders's book [Rei07]
is what you are looking for.
- Finally, those preferring to work in Java might be
well-served by Doug Lea's
textbooks [Lea97,GPB+07].
In contrast, this book meshes real-world machines with real-world
algorithms.
If your sole goal is to find an optimal parallel queue, you might
be better served by one of the above books.
However, if you are interested in principles of parallel design
that allow multiple such queues to operate in parallel, read on!
Paul E. McKenney
2011-12-16