This is one of several forthcoming posts on concurrency and parallel computing. Much of the content may end up in a white paper I’m writing to help developers get to grips with concurrent or parallel computing. I’d love to hear your feedback on this. If you think this is helpful or could be improved then post a comment below.
What if someone told you that your application isn’t going to run faster on the next generation of hardware? What if they said that it might actually run slower? What if they told you that to harness the power in future generations of hardware your applications are going to have to execute code in parallel?
Executing in parallel means different parts of your application are running concurrently. Many applications do this today by using background threads to do work while their main UI thread continues to respond to user input. But parallel computing usually means a lot more concurrency that this. Applications might have tens or hundreds of concurrent operations working together on the same task, not just a couple of worker threads.
Parallel computing has been around for a long time. For most of that time it’s been talked about as the next big thing. Like pen computing it was going to be “huge next year”. I was playing around with parallel machines back in the early 90s and while they held their own in certain specialized fields, like scientific modeling and engineering, they didn’t really move into the mainstream. Yes, they moved into other fields like financial modeling and graphics but hardly became a mainstream or desktop phenomenon.
Why was this? Two reasons really. The hardware was costly and specialized. The first parallel computers used custom built hardware and were very expensive. The machine I wrote applications for in 1990 used custom hardware and had a proprietary operating system and development language and would have cost over $100k in today’s money. Over time this changed somewhat with the market moving to clusters of Windows or Linux machines which could be assembled from off the shelf hardware and use common libraries or compiler extensions making the applications more portable.
What hasn’t changed was that writing software for these parallel machines is still a specialized field and for many the investment in porting an existing application to a different hardware platform and rewriting it to take advantage of a parallel architecture wasn’t worth the effort. Especially when the performance of regular desktop machines keeps increasing. Up until now.
The end of the free lunch
What’s changed today? Well firstly the “free lunch” provided by ever rising processor clock speeds is pretty much coming to an end. Clock speeds are no longer rising (see graph). So the days of your application running twice as fast when recompiled on the next generation of processors is rapidly coming to a close. Moore’s Law will still holds, so for a few more generations the number of transistors will continue to double. But that just means more transistors running at the same speed.
So how do you take advantage this? The answer is concurrency. Add more cores on the same processor die and allow instructions to execute in parallel on each core. Two cores aren’t typically going to run twice as fast as one core but a reasonably written application might see a x1.5 improvement over a single core running at the same speed. This can vary a lot depending but in general two cores can provide more processing power than one core but not twice as much.
What’s even more interesting is that this is here today. The computer you’re reading this on now is probably running code in parallel on two or more cores. Multi-core x86 processors started appearing in 2004 and today pretty much any new machine sold will have at least two cores. The latest generation of desktop applications – like Office3 Excel 2007 and AutoDesk – have already been modified to take advantage of multi-core hardware.
Time for some terminology. Multi-core, manycore, what’s the difference?
“In general, a ‘multi-core’ chip refers to eight or less homogeneous cores in one microprocessor package, whereas a ‘manycore’ chip has more than eight possibly heterogeneous cores in one microprocessor package. In a manycore system, all cores share the resources and services, including memory and disk access, provided by the operating system. Microsoft and industry partners anticipate the advent of affordable general-purpose ‘manycore’ systems in a few years.”
– The Manycore Shift, Microsoft (Nov 2007)
So we have multi-core on the desktop today. In some ways we have manycore is too. Some sections of the scientific and engineering communities are already using the parallel processing capabilities of graphics cards–which contain hundreds of cores–as a compute resource for certain types of problems. There are some great examples of this on Nvidia’s CUDA Zone web site.
So the hardware is here but what about the software? In order to take advantage of this new parallel world applications need to be redesigned and the work distributed across the available cores. None of this makes any difference if only a handful of people can actually write programs that run on these new multi or many core processors. The changes in hardware are starting to drive new investments in software too.
“The software development industry is taking strides to make parallelism more accessible to all developers, and Microsoft is helping to lead the way. With Microsoft Visual Studio 2010, Microsoft is delivering the first wave of powerful developer tools that ease the transition to parallel code.”
– Taking Parallelism Mainstream, Microsoft (Oct 2008)
So that sounds pretty serious, and if you take a look there’s a lot coming in the Visual Studio 2010 timeframe. The pretty diagram below shows libraries for both managed and unmanaged code, tooling support in the form of debuggers and profilers as well as investments at the OS and hardware level.
Really this is about raising the level of abstraction and tooling when it comes to writing concurrent code. This is a good thing. Most significant advances in software development have been about providing more abstraction to allow developers to think about their customer’s problem not their implementation problem. Assembler code gave way to languages like C, managed languages like C# raised the level of abstraction yet again. Today developers think a lot less about memory management and memory locations and more about data structures they defined.
Similarly when it comes to writing concurrent applications it would be a lot better if developers didn’t have to think about threads and threading all the time. You can write code today that takes advantage multi-core processors but it’s a lot of work, thinking about threads, the flow of execution across them and their use of shared memory.
I’d like to consider all this from my perspective as a development manager and lead. What’s the bigger picture? How does this play into application development today? What are the proven practices and patterns which may help? How do can you get the other developers on the team up to speed? Do you have to? What parts of the application should be parallelized? How should you go about it?
This might seem like a tall order and very speculative given that everything in the diagram above is still yet to ship in beta. The key here is that the while all of this is new to most PC developers the High Performance Computing community has been working on this for many years and figuring out what works and what doesn’t. Amazingly some of the things they figured out twenty years ago are still relevant today.
So while I work on part 2 here are some of the resources I found useful so far (in no real order):
- The Manycore Shift – a white paper from Microsoft describing how they think manycore will impact computing in the next few years.
- Taking Parallelism Mainstream – another white paper from Microsoft outlining their plans to make the vision a reality. Specifically what you’ll see in the next release of Visual Studio.
- Parallel Computing Developer Center on MSDN – Lots and lots of information on developments in Microsoft’s developer support for parallel computing.
- CUDA Zone – including a paper on running N-Body models on CUDA.
- Patterns for Parallel Programming (Software Patterns Series) – A pattern language for parallel programming. A great read which I’m still chugging through.
- Intel’s Parallelism Breakthrough Video Series – lots of really cool five minute videos of Intel’s James Reinders explaining some of the basics of parallel computing.
- A Fundamental Turn Toward Concurrency in Software – Herb Sutter writes in Dr. Dobbs. Herb has written a lot of articles of DDJ, most of them are worth reading.
- Software and the Concurrency Revolution – Herb Sutter & Jim Larus.
There’s also some good stuff on what might happen in the future and how the level of abstraction may rise still further.
- Intel’s Larrabee architecture – Intel aren’t going to let Nvidia’s CUDA or ATI’s Stream have the GPU market to themselves. This is a review of their latest announcement in the GPU space.
- The Maestro (now Axum) incubation project blog – A DSL for describing parallel systems.
- Transactional memory incubation project blog – Transactional memory, a world without (explicit) locks.