C# Inline Methods and Optimization
Tuesday, August 19, 2008 – 2:10 PMLast night I was playing around with a vector class that’s part of a scientific computation code I’ve been working on. It’s a long story more on that later but it means I’ve found some of my own time to write code in. Anyway… The Vector class uses the automatically implemented properties feature of C# 3.0:
public class Vector { public double X { get; set; } public double Y { get; set; } public double Z { get; set; } // ... }
I was concerned that the property getters and setters were adding overhead to my class. Normally this would be a non-issue but in this case the Vector class is being used inside a series of tight loops. According to the profiler over 90% of the application’s time is spent inside these loops and the property getters and setters get called hundreds of thousands of times as the loop works through some vector math. That’s a lot of method calls and I wanted to make sure that they were really getting inlined. As Eric Gunnerson points out there is no inline keyword in C#.
Looking at the generated code
So the best way to figure this out is to look at the code that gets generated. I created a class to investigate this:
public class MyClass { public int A { get; set; } public int C; }
I’m using property get here but the discussion applies to any small method you hope the compiler will move inline. The calling code looks like this:
public void TestInlineMethods() { MyClass target = new MyClass(); int a = target.A; Console.WriteLine("A = {0}", a); int c = target.C; Console.WriteLine("C = {0}", c); }
Running this from the debugger and looking at the disassembly shows the following code. Clearly the property A get call did not get inlined, it calls MyClass.get_A() and you can step through this code in the debugger.
int a = target.A; 0000003e mov ecx,edi 00000040 cmp dword ptr [ecx],ecx 00000042 call dword ptr ds:[05FA29A8h] 00000048 mov esi,eax 0000004a mov dword ptr [esp+4],esi int c = target.C; 00000098 mov edi,dword ptr [edi+4]
MyClass.get_A() looks like this:
00000000 push esi
00000001 mov esi,ecx
00000003 cmp dword ptr ds:[03B701DCh],0
0000000a je 00000011
0000000c call 76BA6BA7
00000011 mov eax,dword ptr [esi+0Ch]
00000014 pop esi
00000015 ret
This seemed a little odd as property getters seem like and ideal candidate. The inline version is one instruction compared to thirteen to make the method call.
What’s going on here?
Further investigation and conversations with p&p’s Chris Tavares pointed me in the right direction. There are two compilers at work here.
The C# compiler that takes to code and turns it into IL at compile time and the JIT compiler that takes the IL and generates native machine code at runtime based on the target processor architecture. The JIT compiler is responsible for deciding what to inline. This is because only the JIT compiler knows enough about the processor architecture to decide if putting a method inline is appropriate as it’s a tradeoff between instruction pipelining and cache size.
If you start a of your application from Visual Studio with the debugger attached (F5) then all the JIT optimizations will be disabled even if optimization is enabled. If you want to see the optimized code then you need to compile the application with optimization enabled (in the build tab of the project properties) without the debugger (CTRL-F5) and then attach it. This is covered in “Writing High-Performance Managed Applications : A Primer“, the “Why do I not see these optimizations in Visual Studio?” section half way down.
Attaching the debugger to an optimized executabe and showing the disassembly for the A property getter shows it is now inlined.
int a = target.A;
00000024 mov ebx,dword ptr [edi+0Ch]
You’ll see the same behavior for a conventional property where the get and set methods are explicitly defined in the source code.
What gets inlined and when?
How to determine what get’s inlined? The short answer is that you can’t. The MSDN article “Writing High-Performance Managed Applications : A Primer” gives the following guidance:
- Methods that are greater than 32 bytes of IL will not be inlined.
- Virtual functions are not inlined.
- Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than
if/then/else;
in this case,switch
orwhile
. - Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining.
- If any of the method’s formal arguments are structs, the method will not be inlined.
But there are no hard and fast rules and in most cases you shouldn’t have to worry about this, just let the compilers do their thing and certainly don’t try to out guess them. If you do want to see what your code looks like you’ll have to attach the debugger. There’s also a further discussions on compiler optimization in MSDN Magazine.
9 Responses to “C# Inline Methods and Optimization”
There are slight changes in .NET 3.5 SP1 with regard to the x86 JIT’s inlining heuristics. Vance Morrison covers this well:
http://blogs.msdn.com/vancem/archive/2008/08/19/to-inline-or-not-to-inline-that-is-the-question.aspx
By Sasha Goldshtein on Aug 20, 2008
Sasha,
Thanks for pointing this out! It’s a really good post and explains the tradeoffs involved in when to inline and when not to.
Seems like simple property accessors will almost always be inlined as the inline code (one instruction) is less than the call (five instructions) even with a miltiplier of one:
If InlineSize <= NonInlineSize * Multiplier do the inlining
Thanks
By Ade Miller on Aug 20, 2008
I think a vector like this is a great place to use a struct.
By Eric Gunnerson on Aug 20, 2008
Hi Eric,
I guess ill thought out (Manager trying to code) concern when I wrote the class was around memcpy overhead. Thinking about it further the Vector is only three doubles so it should be a struct. It “behaves” like an int so it should really be a struct. Time to rewrite it.
In general though it’s interesting to figure out what the compiler(s) are doing under the hood. It might even make me a better programmer one day… maybe.
Thanks
By Ade Miller on Aug 20, 2008
I think insted of doing Ctrl-F5, for doing JIT optimization, even uncheckecking “Supress JIT optimization on module Load” should also optimize code
http://blogs.msdn.com/saraford/archive/2008/08/13/did-you-know-how-to-optimize-your-code-for-a-build-290.aspx
Regards,
By abhishek on Aug 20, 2008
Not suppressing JIT optimization will work too, thanks for pointing it out. This gives you the attach debugger behavior all the time which might or might not be what you want.
Thanks!
By Ade Miller on Aug 21, 2008