Using CUDA and Thrust with Visual Studio 2010

Sunday, March 6, 2011 – 11:16 AM

Using CUDA 4.0 RC2? Read the update post here.

I was working on setting up some new CUDA projects as I’m doing some spiking (prototyping for the not so agile crowd) work to figure out how best to use CUDA 4.0. I’ve turned it in to a quick tutorial on how to write a simple application that allows you to use both CUDA and the latest C++0x features in Visual Studio 2010.

Because the current CUDA SDK requires projects to compile using the v90 toolset (Visual Studio 2008) the solution requires two projects. One DLL project containing the CUDA and targeting v90 and a second application project targeting v100 (VS 2010) containing the C++ code.

Click on the images to see full size versions.

Installing dependencies

Make sure you have the following installed.

If you don’t have 4.0. I built this walkthrough using the 4.0 RC but it should work with 3.2.

Setting up the solution

Create a solution containing two projects. Two projects are required because one targets the V100 (VS 2010) compiler to allow access to the latest C++0x language features and one targets the V90 (VS 2008) compiler because this is required by CUDA.

1) Create a Win32 console application called HelloWorld. Select the defaults for the remaining pages in the wizard. This project will contain the main entry point to your application and any Windows specific code, like the Parallel Patterns Library (PPL) code used for managing threads.

1_HelloWorld

2) Create a second Win32 project called HelloWorldCuda. This is the DLL that will contain your CUDA code. In the application settings screen select DLL for the application type and check the empty project box.

2_HelloWorldCuda

2a_HelloWorldCuda

Configure the CUDA project

There’s a number of settings that need to be configured on the HelloWorldCuda project.

3) Configure the HelloWorldCuda project.

3.1) Select the Project | Build Customizations… menu item. In the dialog select the CUDA 4.0 item. This adds support for CUDA C/C++ files but there needs to be a .CU file in the project before the build settings appear in the project properties. If you don’t have CUDA 4.0 then use the 3.2 rules.

3_1_HelloWorldCuda

3.2) Add two new items to the project; a C++ file (.cpp) and header file (.h) called Hello.cpp and Hello.h, rename the .cpp file to Hello.cu. Your solution should look like this:

3_2_HelloWorldCuda

3.3) Select the Hello.cu file and open it’s properties pages. In the general tab change the Item Type to “CUDA C/C++”.

3_3_HelloWorldCuda

3.4) Select the project and open the properties (ALT-Enter). In the general tab set the Platform Toolset field to v90 (if you are not able to do this then you probably don’t have VS 2008 installed, this is required by CUDA).

3_4_HelloWorldCuda

3.5) Open the Linker | General properties page and add “$(CUDA_PATH_V4_0)\lib\$(Platform);” to the Additional Libraries Directories field.

3_5_HelloWorldCuda

Note that the CUDA/C++ properties tab is now visible.

3.6) Open the Linker | Input properties page and add “cudart.lib;” to the Additional Dependencies field.

3_6_HelloWorldCuda

3.7) Make sure that your projects will always build in the correct order. Right click on the HelloWorld project and select Project Dependencies. Check the box next to HelloWorldCuda. This will force the HellowWorldCuda project to build before HelloWorld.

5) Build the solution. At this point the solution should build without any warnings or errors. It doesn’t do anything yet but all the pieces are in place.

Adding some CUDA/Thrust code

Now it’s time to add some code. We need to write some CUDA code in HelloWorldCuda DLL and export it so that the HelloWorld application can execute it.

5) Configure the HelloWorld project. It needs to link the HelloWorldCuda and also have access to the appropriate header files.

5.1) Open the Linker | General properties page and add “..\$(Configuration);$(CUDA_PATH_V4_0)\lib\$(Platform);” to the Additional Libraries Directories field.

5_1_HelloWorld

5.2) Open the Linker | Input properties page and add “cudart.lib;HelloWorldCuda.lib;” to the Additional Dependencies field.

5_2_HelloWorld

5.3) Open the C/C++ | general properties page and add “..\HelloWorldCuda\; $(CUDA_PATH_V4_0)\Include;” to the Additional Include Directories field.

5_3_HelloWorld

5.4) Open the Project | Project Dependencies menu item and check the HelloWorldCuda box to make the CUDA project a dependency of the main Win32 application project.

5_4_HelloWorld

6) Now it’s time to write some code. CUDA 4.0 now comes with Thrust so we’re going to use Thrust in our example. If you’re not using 4.0 then you need to download the latest Thrust library (link below) and copy it into a Thrust folder inside the CUDA SDK include folder %CUDA_PATH%\include\thrust.

This is a Hello World application so the code is very simple. It’s a variation of the first example on the Thrust project homepage.

Add the following class declaration to Hello.h. Most of the code is to fix up compilation warnings. Really all this does is declares a class that is constructed with a host_vector<unsigned long> and then has some methods that execute CUDA code and return results.

   1: #pragma once
   2: #pragma warning(push)
   3: #pragma warning(disable: 4996)      // Thrust's use of strerror
   4: #pragma warning(disable: 4251)      // STL class exports
   5: #include "thrust/host_vector.h"
   6: #include "thrust/device_vector.h"
   7: #pragma warning(pop)
   8: 
   9: // See: http://support.microsoft.com/default.aspx?scid=KB;EN-US;168958 
  10: //      http://msdn.microsoft.com/en-us/library/esew7y1w.aspx
  11: //      http://www.unknownroad.com/rtfm/VisualStudio/warningC4251.html
  12: 
  13: #if defined(__CUDACC__)
  14: #   define DECLSPECIFIER  __declspec(dllexport)
  15: #   define EXPIMP_TEMPLATE
  16: #else
  17: #   define DECLSPECIFIER  __declspec(dllimport)
  18: #   define EXPIMP_TEMPLATE extern
  19: #endif
  20: 
  21: #pragma once
  22: #pragma warning(push)
  23: #pragma warning(disable: 4231)
  24: EXPIMP_TEMPLATE template class
  25:     DECLSPECIFIER thrust::device_vector<unsigned long>;
  26: EXPIMP_TEMPLATE template class
  27:     DECLSPECIFIER thrust::detail::vector_base<unsigned long,
  28:         thrust::device_malloc_allocator<unsigned long>>;
  29: #pragma warning(pop)
  30: 
  31: class DECLSPECIFIER Hello
  32: {
  33: private:
  34:     thrust::device_vector<unsigned long> m_device_data;
  35: 
  36: public:
  37:     Hello(const thrust::host_vector<unsigned long>& data);
  38:     unsigned long Sum();
  39:     unsigned long Max();
  40: };

Hello.cu declares the constructor and Sum and Max methods. The constructor copies the data onto the device, while the Sum and Max methods call Thrust algorithms to carry out calculations on the GPU.

   1: #include "Hello.h"
   2: 
   3: Hello::Hello(const thrust::host_vector<unsigned long>& data)
   4: {
   5:     m_device_data = data;
   6: }
   7: 
   8: unsigned long Hello::Sum()
   9: {
  10:     return thrust::reduce(m_device_data.cbegin(), m_device_data.cend(),
  11:         0, thrust::plus<unsigned long>());
  12: }
  13: 
  14: unsigned long Hello::Max()
  15: {
  16:     return *thrust::max_element(m_device_data.cbegin(), m_device_data.cend(),
  17:         thrust::less<unsigned long>());
  18: }

Finally HelloWorld.cpp contains the application’s entry point and executes the CUDA/Thrust code. It also calculates the answers on the host’s CPU so that you can check for correctness.

   1: #include "stdafx.h"
   2: #include <iostream>
   3: #include <algorithm>
   4: #include <vector>
   5: #include "Hello.h"
   6: 
   7: using namespace ::std;
   8: 
   9: int _tmain(int argc, _TCHAR* argv[])
  10: {
  11:     cout << "Generating data..." << endl;
  12:     thrust::host_vector<unsigned long> host_data(100000);
  13:     thrust::generate(host_data.begin(), host_data.end(), rand);
  14:     cout << "generated " << host_data.size() << " numbers" << endl;
  15: 
  16:     cout << endl << "Running host code..." << endl;
  17:     unsigned long host_result = thrust::reduce(host_data.cbegin(), host_data.cend(),
  18:                                                0, thrust::plus<unsigned long>());
  19:     cout << "The sum is: " << host_result << endl;
  20:     host_result = *thrust::max_element(host_data.cbegin(), host_data.cend(),
  21:                                        thrust::less<unsigned long>());
  22: 
  23:     cout << "The max is: " << host_result << endl;
  24: 
  25:     cout << endl << "Copying data to device..." << endl;
  26:     Hello hello(host_data);
  27: 
  28:     cout << endl << "Running CUDA device code..." << endl;
  29:     unsigned long device_result = hello.Sum();
  30:     cout << "The sum is: " << device_result << endl;
  31: 
  32:     cout << endl << "Running CUDA device code..." << endl;
  33:     device_result = hello.Max();
  34:     cout << "The max is: " << device_result << endl;
  35: 
  36:     return 0;
  37: }

Run the application and you should see the following output:

6_Console

You may see lots of warnings Resolving Thrust/CUDA warnings “Cannot tell what pointer points to…”. This appears to be a know issue. They only appear when the NVCC compiler’s –G0 flag is set and/or the project is compiling against arch sm_10.

Making use of the Parallel Patterns Library and C++ lambdas

So now we have a Win32 application that runs CUDA code using the Thrust template library. We could have done this with a single project that targeted the v90 toolset. Update the HelloWorld.cpp file to use the parallel_invoke algorithm to run the host and device code in parallel.

   1: #include "stdafx.h"
   2: #include <iostream>
   3: #include <algorithm>
   4: #include <vector>
   5: #include "ppl.h"
   6: 
   7: #include "Hello.h"
   8: 
   9: using namespace ::std;
  10: using namespace ::Concurrency;
  11: 
  12: int _tmain(int argc, _TCHAR* argv[])
  13: {
  14:     cout << "Generating data..." << endl;
  15:     thrust::host_vector<unsigned long> host_data(100000);
  16:     thrust::generate(host_data.begin(), host_data.end(), rand);
  17:     cout << "generated " << host_data.size() << " numbers" << endl;
  18: 
  19:     parallel_invoke(
  20:         [host_data]()
  21:         {
  22:             cout << endl << "Running host code..." << endl;
  23:             unsigned long host_result = thrust::reduce(host_data.cbegin(),
  24:                 host_data.cend(), 0, thrust::plus<unsigned long>());
  25:             cout << "The sum is: " << host_result << endl;
  26:             host_result = *thrust::max_element(host_data.cbegin(),
  27:                 host_data.cend(), thrust::less<unsigned long>());
  28:             cout << "The max is: " << host_result << endl;
  29:         },
  30:         [host_data]()
  31:         {
  32:             cout << endl << "Copying data to device..." << endl;
  33:             Hello hello(host_data);
  34: 
  35:             cout << endl << "Running CUDA device code..." << endl;
  36:             unsigned long device_result = hello.Sum();
  37:             cout << "The sum is: " << device_result << endl;
  38: 
  39:             cout << endl << "Running CUDA device code..." << endl;
  40:             device_result = hello.Max();
  41:             cout << "The max is: " << device_result << endl;
  42:         }
  43:     );
  44:     return 0;
  45: }

Notice how the output ordering has changed. The call to parallel_invoke takes to lambda expressions containing code that is now run in parallel.

7_Console

The complete code for this sample is available on here.

Other resources

How to create/upgrade a CUDA project in VS2008 and VS2010 to work with Nsight 1.5 and CUDA 3.2 (NVIDIA forum post)

Thrust (Project homepage on Google Code)

The Parallel Patterns Library

Lambda expressions in C++ Visual Studio 2010

  1. 22 Responses to “Using CUDA and Thrust with Visual Studio 2010”

  2. Hello, I was wondering if you have heard if there is any plans of CUDA being able to work under VS2010 in the near future (I mean, compile with 2010 compiler and no needing the 2008).

    Thanks!

    By Michel on Mar 14, 2011

  3. Michel,

    I’m not aware of what NVIDIA’s plans are here. I spoke to a couple of people from NVIDIA at GTC and they understand that it’s an issue for Windows developers.

    Ade

    By Ade Miller on Mar 14, 2011

  4. Thanks Ade!

    By Michel on Mar 14, 2011

  5. Hi Ade,

    Thanks a lot of the detailed explanation; things such as these are sorely lacking, in my opinion. Your example worked great, but I would like to use a static library instead of a DLL. I havnt managed to make that compile so far though. If you could give me any pointers, it would be much appreciated!

    Regards,
    Eelco Hoogendoorn

    By Eelco on Mar 18, 2011

  6. To be more specific: If I change the CUDA DLL project to a LIB project, and instead of setting the ‘Additional Libraries Directories’ and ‘Additional Dependencies’ at the linker, set the corresponding fields at the librarian, things dont work out.

    Rebuilding just the CUDA project succeeds without errors; which is nonsense: the .LIB never gets created, as is revealed upon a build of the whole solution.

    Any ideas?

    Regards,
    Eelco

    By Eelco on Mar 18, 2011

  7. Thank you so much for this work!! I was lacking this severely for my research.

    By Emre Turkoz on Mar 24, 2011

  8. Hi Ade,
    First of all: Thanks a lot for your efforts to give a public guide to setup the VS2010 environment for CUDA. It was already very helpful.
    As a complete VS2010 newbie stumbling from unexperienced Linux CUDA development into NSight I tried your small guide to get me started setting up the environment.
    Everything works finde (YEAHAW)… but I have some small probs to get things done the way I need them. The main problem is that I’m an unexperienced programmer.
    1. I’m not used to dlls.
    2. I’m not used to compiling x64 projects.
    Nevertheless I need to compile x64 code as I have high memory requirements in my apps. Therefore I tried to modify your walkthrough to change everything to “x64”. But doing so results in a set of linker errors and I cannot find the issue (most likely because I didn’t understand some basics somewhere).
    So Question 1 is: How to compile for this example project for x64?
    The second one is: I have no clue about .dlls and thrust also don’t feel very comfortable with all these declspec and pragma stuff I don’t understand.
    So what I need (and most likely also many others that are not at your level of experience allowing you to make up all the settings on your own) is a “keep it simple” example. Plain Cuda. No dlls (if possible?), set up both x64 and win32 for debug and for optimized release configs. This would be the reasonable starting point for most beginners. Any chance to get you helping us by dangling fame?
    cheers, markus

    By Markus on Mar 25, 2011

  9. Hi Markus,

    I’m on the road at the moment but it looks like this should be my next project. I can certainly help with the x64 configuration (most of my projects compile this way). Using a LIB rather than a DLL may be more difficult. I talked to a couple of people here about static linking and the PPL and then may be a fe gotchas.

    Watch this space, maybe I’ll get to it this weekend.

    Ade

    By Ade Miller on Mar 30, 2011

  10. Markus,

    There’s a new post that show you how to do x64 builds Using CUDA and Thrust with VS 2010 Part 2: x64 Builds

    Ade

    By Ade Miller on Apr 9, 2011

  11. Markus,

    Here’s another blog post that covers the simpler scenario of a single project. This may be helpful for you.

    http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in-visual-studio-2010/

    The reason I use two projects is because I want to use the new C++0x features and the Parallel Patterns Library in my application. You cannot do this with a single project because the CUDA project needs to use the VS 2008 C++ compiler. I’ll have a look at supporting libs.

    Ade

    By Ade Miller on Apr 9, 2011

  12. Just thought I should mention that CUDA 4 RC2 supports the V100 compiler and I was able to successfully create x64/win32 CUDA 4 projects under VS2010 using your two guides. You no longer have to create two separate projects and/or designate the CUDA project as a DLL.

    By Carlos on Apr 14, 2011

  13. Carlos,

    Excellent news about RC2. I’ll be upgrading at some point and writing a better, hopefully shorter blog post about hwo to do this.

    Ade

    By Ade Miller on Apr 22, 2011

  14. that was a really really usefull article , this was the best quid i found in the net. thank you.

    By nafiseh on Jul 11, 2011

  15. Dear Sir, your example is very good to take start with CUDA n Visual studio but i am facing some problems..as you said to enter “cudart.lib” in the input of the linker tab..but after building it gives error that
    LINK : fatal error LNK1104: cannot open file ‘cudart.lib’
    please guide me in this regard.

    Thanks

    By hamid on Oct 4, 2011

  16. cudart.lib needs to be on your library search path. Do you have the CUDA SDK installed? Are the environment variables CUDA_PATH_V4_0 set?

    Ade

    By Ade Miller on Nov 5, 2011

  1. 7 Trackback(s)

  2. Mar 7, 2011: Dew Drop – March 7, 2011 | Alvin Ashcraft's Morning Dew
  3. Mar 7, 2011: Interesting .NET Links - March 7 , 2011 | Tech Blog
  4. Mar 7, 2011: progg.ru
  5. Mar 7, 2011: DotNetShoutout
  6. Mar 11, 2011: Turing Award/iPad 2/Games/Algorithm/and more… | CactuarJ's NotePad
  7. Apr 9, 2011: Using CUDA and Thrust with VS 2010 Part 2: x64 Builds | #2782 - Thinking about agile (small 'a') software development, patterns and practices for building Microsoft .NET applications.
  8. Jul 13, 2011: Using CUDA and Thrust with Visual Studio 2010... | .NET and Smart Client | Syngu

Sorry, comments for this entry are closed at this time.