## C# Optimization Revisited Part 1: Algorithms and Bill Clinton

Thursday, March 19, 2009 – 3:58 PM

As part of my journey in technical computing I’ve been writing a gravitational N-body simulation. Turns out you learn a thing or two about optimization when building numerical simulations that need to run as fast as possible.

First off I used Vance Morrison’s code timer library to run multiple passes and gather statistics. You can’t optimize unless you know how fast, or slow something already is and where the bottlenecks are. Then I implemented to integration algorithms in my N-body code and compared them.

### Two algorithms compared

If you’re in a hurry and/or remember some high school math or physics then you can just look at the equations and the code snippets and skip to the final section.

Most of the computational heavy lifting in an N-body model occurs in series of tight loops where the positions of each body in the model is updated, the Integrate method. Each Integrate method takes a time interval and array of bodies and updates each Body in the array based on the time interval.

The simplest algorithm is the Forward Euler. Forward Euler calculates the gravitational force on each body and uses this to calculate the acceleration. It then updates the position (based on the current velocity) and velocity using the current computed acceleration: Where  r, v and a are vectors of the position, velocity and acceleration of a body at integration step n and (n + 1). The time step, dt, is the duration of time between integration steps.

Just for completeness, the acceleration, a, due to the force, fij between any two bodies i and j is calculated as follows: Where m is the mass, rij is the vector between the two bodies and G is Newton’s gravitational constant. In the implementation below the Body.Force method implements the first equation.

A Forward Euler implementation:

```    public void Integrate(double dT, Body[] bodies)
{
for (int i = 0; i < bodies.Length; i++)
bodies[i].Acceleration = new Vector();

for (int i = 0; i < bodies.Length; i++)
{
for (int j = (i + 1); j < bodies.Length; j++)
{
Vector f = Body.Force(bodies[i], bodies[j]);
bodies[i].Acceleration += f / bodies[i].Mass;
bodies[j].Acceleration -= f / bodies[j].Mass;
}

bodies[i].Position += bodies[i].Velocity * dT;
bodies[i].Velocity += bodies[i].Acceleration * dT;
}
}```

It’s the simplest integration approach, and it turns out there are several ways to improve on it.

Make way for the Leapfrog algorithm!

Leapfrog algorithm makes use of the acceleration from the previous time step and factors this into the calculation of the new position. It also uses the current and previous time step values for acceleration when calculating the velocity. A Leapfrog implementation:

```    public void Integrate(double dT, Body[] bodies)
{
for (int i = 0; i < bodies.Length; i++)
{
bodies[i].Velocity += bodies[i].Acceleration * (dT / 2.0);
bodies[i].Position += bodies[i].Velocity * dT;
}

for (int i = 0; i < bodies.Length; i++)
bodies[i].Acceleration = new Vector();

for (int i = 0; i < bodies.Length; i++)
{
for (int j = (i + 1); j < bodies.Length; j++)
{
Vector f = Body.Force(bodies[i], bodies[j]);
bodies[i].Acceleration += f / bodies[i].Mass;
bodies[j].Acceleration -= f / bodies[j].Mass;
}
bodies[i].Velocity += bodies[i].Acceleration * (dT / 2.0);
}
}```

### But which is better? Which is faster? If you said the Forward Euler you’d be both right and wrong. The performance results (on the right) show that for the same number of integrations the Forward Euler is marginally faster, 8.6 vs. 7.9 seconds. It also needs less data per Body as the acceleration need not be persisted between time steps.

Unfortunately fast isn’t good enough in this case. Sure, there’s less code so the algorithm executes faster but the results are less accurate. The Leapfrog algorithm makes uses the acceleration at the current and next time steps to calculate better a better result for the velocity and position. So these raw performance numbers don’t compare the same things. They compare time per integration not time taken to simulate to the same accuracy over a set amount of time.

So… In order to obtain the same degree of accuracy the time step (dT) used in the Forward Euler implementation needs to be much smaller than that used in the Leapfrog (see graph to right). In other words I have to call the Forward Euler code many more times – hundreds more times – with a smaller value of dT for every call to the Leapfrog implementation. This more than outweighs the slight raw speed improvement of the Forward Euler implementation.

Finally…

To paraphrase Bill Clinton “It’s the algorithm stupid”. Regardless of the complexity of the math in the above example the lesson I learnt here is… After you’ve established how slow something is think about the underlying algorithm before trying to tweak the algorithm you may already have.

In part 2 I’ll cover another consideration, concurrency and manycore processors.

1. ## 8 Responses to “C# Optimization Revisited Part 1: Algorithms and Bill Clinton”

2. Hi.
An interesting blog entry. However I have a recommendation about the code. I think you should rewrite this for loop

for (int i = 0; i < bodies.Length; i++)
{
bodies[i].Velocity += bodies[i].Acceleration * (dT / 2.0);
bodies[i].Position += bodies[i].Velocity * dT;
}

to this :
for (int i = 0; i < bodies.Length; i++)
{
var b = bodies[i];
b.Velocity += b.Acceleration * (dT / 2.0);
b.Position += b.Velocity * dT;
}

It will (probably) run faster.

3. Petar,

Thanks for your suggestion. I tried this and ran some tests and the difference is negligible. In fact in most of my runs the version you’d expect to be faster is actually slightly slower but taking into account the margin of error you would have to say they were pretty much equal at best.

```Simple: 239.4 ms +- 1.8% Optimized: 251.5 ms +- 1.7%```

I’m still looking at the generated IL and assembler to see why this is.

Really this is an example of a micro-optimization. My opinion is that these are very rarely worth doing. At this point the programmer is second guessing the compiler and JITer.

In this specific scenario Leapfrog doesn’t represent the best algorithm. Firstly, there are forth order algorithms – like Hermite – that take longer per iteration but are much more accurate. Secondly this code is only executing on one core of a several on the machine. I have a future post in the works that uses the Parallel Extensions to .NET 3.5 to give nearly a 4x improvement in performance on an Intel i7 processor.

So time spent thinking about the than micro-optimizing the existing code is actually better spent thinking about overall optimization.

4. I totally agree with you. I’m also using Parallel Extensions. It’s a micro optimization but because you talk about optimization I think it’s worth mention it. You’ll start to see a difference (between the two versions) when you make many calls to bodies[i] in the body of your loop.

5. Petar,

That might be true but the numbers above are for an array of 2,000,000 Bodies on a dual core machine. I see similar numbers on an 4 (8) core i7 with 6Gb and 4,000,000 bodies – these are way beyond what’s possible on existing hardware.

In all cases the optimization runs slightly slower within the margin of error of the test but it always comes out behind. If you look at the IL generated by the original version it appears to be using intermediate variables anyhow. So I’m not convinced this is actually an improvement at all, although I do think it makes the code more readable.