Parallel Programming with Microsoft Visual C++
Saturday, October 30, 2010 – 2:09 PMAs I’ve mentioned before we’ve been quietly working away on version of the .NET Parallel Programming book for C++ developers who want to use the Parallel Patterns Library or Asynchronous Agents Library to add parallelism to their applications.
Well… we now have some draft chapters and example code ready for review.
This book describes patterns for parallel programming, with code examples, that use the new parallel programming support in Visual C++. This support is commonly referred to as the Parallel Patterns Library (PPL). There is also an example of how to use the Asynchronous Agents Library in conjunction with the PPL…
The CPU meter shows the problem. One core is running at 100 percent, but all the other cores are idle. Your application is CPU-bound, but you are using only a fraction of the computing power of your multicore system. What next?
The Dataflow Network pattern decomposes computation into cooperating asynchronous components that communicate by sending and receiving messages. Buffering the messages allows concurrency. There are a variety of techniques for implementing dataflow networks. The techniques described in this chapter involve the use of in-process messaging blocks and asynchronous agents, which are provided by the Asynchronous Agents library.
There are also samples for Visual Studio 2010 to go with the chapters. You can find all of this on project downloads page.
Much of the introductory material is similar to the .NET book, this is intentional. The patterns are the same but how they are implemented using the PPL and Async Agents Library is not.
We’d love to heard your feedback on this draft material. If you have time to read the chapters or look at the code and post comments on the CodePlex site that would be great. All feedback is read and taken into account as we shape the material.
Thanks!
Ade
6 Responses to “Parallel Programming with Microsoft Visual C++”
I am wondering if you could some benchmark statistics about your examples, such as how well they are divided among multi-threads/cores, which would be an intuitive way of seeing the powers of the libraries.
I am also interested in seeing how a comparison between VC++ libraries and TBB and other open source libraries, not about being better or worse, but the features of each.
Thanks.
By tom on Nov 23, 2010
Just gave the C++ Parallel Patterns Library a whirl and with a Core2 660 @2.4 Ghz and did a hmac task in 78 seconds with PPL. It took 190 seconds serially.
I’m just starting to learn PPL, so I’ll try to give some feedback on the chapters as I get a chance to read them.
Here is some feedback on the introduction:
Figure 1: “Parallel programming patterns” is helpful once you understand the terms. The table on page 17 is even more helpful, and might be better placed earlier. It takes the reader from the characteristics of their algorithm to the pattern and chapter.
—
This paragraph seems muddled to me:
“Another advantage to grouping work into larger and fewer tasks is that such tasks are often more independent of each other than smaller but more numerous tasks. Larger tasks are less likely than smaller tasks to share local variables or fields. Unfortunately, in applications that rely on large mutable object graphs, such as applications that expose a large object model with many public classes, methods, and properties, the opposite may be true. In these cases, the larger the task, the more chance there is for unexpected sharing of data or other side effects.”
The guidance should be to build independent tasks large enough to make the overhead unimportant, I think. Not much point in telling the user that sometimes large tasks have dependencies, and sometimes they don’t.
—
Amdahl’s law can be concisely explain with a this equation with linear speedup:
total time= serial time + ( parallel time / processors )
As the number of processors goes to infinity, the serial time dominates.
The speedup is time for N processors divided by time for one processor, so for maximum N,
speedup is (s + p) / s
which is precisely the inverse of the fraction of time spent in serial code.
So why did my program actually ran more than twice as fast with PPL on two cores than as straight C++ code? Looking at the task manager, I could see that both cores were actually running at about 50% with the straight C++ code. Since I used some ncrypt library calls, ncrypt must have been dividing the executions between the two cores. My guess is that the overhead in this process caused the serial program to run just a bit slower than it would have if it ran on one core. Both cores ran at 100% with PPL.
By Andrew Webb on Dec 21, 2010
Correction:
Time for one processor= s + p
Time for infinite processors = s
The speedup is time for one processor divided by time for infinite processors, so
speedup is (s + p) / s
and s /(s+p) is the fraction of time spent serially.
By Andrew Webb on Dec 22, 2010
Andrew,
Thanks for the feedback. I’ll review it and try and some of the changes into the text. We’re running up against deadlines. There should be more chapters to read late this week.
Thanks again!
Ade
By Ade Miller on Jan 3, 2011
Tom,
Most of the samples show times for execution of both serial and parallel versions of each example. We also ship some profiler data.
Is this what you were looking for?
Ade
By Ade Miller on Jan 3, 2011