Using CUDA and Thrust with Visual Studio 2010
Sunday, March 6, 2011 – 11:16 AMUsing CUDA 4.0 RC2? Read the update post here.
I was working on setting up some new CUDA projects as I’m doing some spiking (prototyping for the not so agile crowd) work to figure out how best to use CUDA 4.0. I’ve turned it in to a quick tutorial on how to write a simple application that allows you to use both CUDA and the latest C++0x features in Visual Studio 2010.
Because the current CUDA SDK requires projects to compile using the v90 toolset (Visual Studio 2008) the solution requires two projects. One DLL project containing the CUDA and targeting v90 and a second application project targeting v100 (VS 2010) containing the C++ code.
Click on the images to see full size versions.
Installing dependencies
Make sure you have the following installed.
- Visual Studio 2010 and 2008 SP1 (required by CUDA).
- Parallel NSight 1.51
- CUDA 4.0 RC or 3.2 and Thrust
If you don’t have 4.0. I built this walkthrough using the 4.0 RC but it should work with 3.2.
Setting up the solution
Create a solution containing two projects. Two projects are required because one targets the V100 (VS 2010) compiler to allow access to the latest C++0x language features and one targets the V90 (VS 2008) compiler because this is required by CUDA.
1) Create a Win32 console application called HelloWorld. Select the defaults for the remaining pages in the wizard. This project will contain the main entry point to your application and any Windows specific code, like the Parallel Patterns Library (PPL) code used for managing threads.
2) Create a second Win32 project called HelloWorldCuda. This is the DLL that will contain your CUDA code. In the application settings screen select DLL for the application type and check the empty project box.
Configure the CUDA project
There’s a number of settings that need to be configured on the HelloWorldCuda project.
3) Configure the HelloWorldCuda project.
3.1) Select the Project | Build Customizations… menu item. In the dialog select the CUDA 4.0 item. This adds support for CUDA C/C++ files but there needs to be a .CU file in the project before the build settings appear in the project properties. If you don’t have CUDA 4.0 then use the 3.2 rules.
3.2) Add two new items to the project; a C++ file (.cpp) and header file (.h) called Hello.cpp and Hello.h, rename the .cpp file to Hello.cu. Your solution should look like this:
3.3) Select the Hello.cu file and open it’s properties pages. In the general tab change the Item Type to “CUDA C/C++”.
3.4) Select the project and open the properties (ALT-Enter). In the general tab set the Platform Toolset field to v90 (if you are not able to do this then you probably don’t have VS 2008 installed, this is required by CUDA).
3.5) Open the Linker | General properties page and add “$(CUDA_PATH_V4_0)\lib\$(Platform);” to the Additional Libraries Directories field.
Note that the CUDA/C++ properties tab is now visible.
3.6) Open the Linker | Input properties page and add “cudart.lib;” to the Additional Dependencies field.
3.7) Make sure that your projects will always build in the correct order. Right click on the HelloWorld project and select Project Dependencies. Check the box next to HelloWorldCuda. This will force the HellowWorldCuda project to build before HelloWorld.
5) Build the solution. At this point the solution should build without any warnings or errors. It doesn’t do anything yet but all the pieces are in place.
Adding some CUDA/Thrust code
Now it’s time to add some code. We need to write some CUDA code in HelloWorldCuda DLL and export it so that the HelloWorld application can execute it.
5) Configure the HelloWorld project. It needs to link the HelloWorldCuda and also have access to the appropriate header files.
5.1) Open the Linker | General properties page and add “..\$(Configuration);$(CUDA_PATH_V4_0)\lib\$(Platform);” to the Additional Libraries Directories field.
5.2) Open the Linker | Input properties page and add “cudart.lib;HelloWorldCuda.lib;” to the Additional Dependencies field.
5.3) Open the C/C++ | general properties page and add “..\HelloWorldCuda\; $(CUDA_PATH_V4_0)\Include;” to the Additional Include Directories field.
5.4) Open the Project | Project Dependencies menu item and check the HelloWorldCuda box to make the CUDA project a dependency of the main Win32 application project.
6) Now it’s time to write some code. CUDA 4.0 now comes with Thrust so we’re going to use Thrust in our example. If you’re not using 4.0 then you need to download the latest Thrust library (link below) and copy it into a Thrust folder inside the CUDA SDK include folder %CUDA_PATH%\include\thrust.
This is a Hello World application so the code is very simple. It’s a variation of the first example on the Thrust project homepage.
Add the following class declaration to Hello.h. Most of the code is to fix up compilation warnings. Really all this does is declares a class that is constructed with a host_vector<unsigned long> and then has some methods that execute CUDA code and return results.
1: #pragma once
2: #pragma warning(push)
3: #pragma warning(disable: 4996) // Thrust's use of strerror
4: #pragma warning(disable: 4251) // STL class exports
5: #include "thrust/host_vector.h"
6: #include "thrust/device_vector.h"
7: #pragma warning(pop)
8:
9: // See: http://support.microsoft.com/default.aspx?scid=KB;EN-US;168958
10: // http://msdn.microsoft.com/en-us/library/esew7y1w.aspx
11: // http://www.unknownroad.com/rtfm/VisualStudio/warningC4251.html
12:
13: #if defined(__CUDACC__)
14: # define DECLSPECIFIER __declspec(dllexport)
15: # define EXPIMP_TEMPLATE
16: #else
17: # define DECLSPECIFIER __declspec(dllimport)
18: # define EXPIMP_TEMPLATE extern
19: #endif
20:
21: #pragma once
22: #pragma warning(push)
23: #pragma warning(disable: 4231)
24: EXPIMP_TEMPLATE template class
25: DECLSPECIFIER thrust::device_vector<unsigned long>;
26: EXPIMP_TEMPLATE template class
27: DECLSPECIFIER thrust::detail::vector_base<unsigned long,
28: thrust::device_malloc_allocator<unsigned long>>;
29: #pragma warning(pop)
30:
31: class DECLSPECIFIER Hello
32: {
33: private:
34: thrust::device_vector<unsigned long> m_device_data;
35:
36: public:
37: Hello(const thrust::host_vector<unsigned long>& data);
38: unsigned long Sum();
39: unsigned long Max();
40: };
Hello.cu declares the constructor and Sum and Max methods. The constructor copies the data onto the device, while the Sum and Max methods call Thrust algorithms to carry out calculations on the GPU.
1: #include "Hello.h"
2:
3: Hello::Hello(const thrust::host_vector<unsigned long>& data)
4: {
5: m_device_data = data;
6: }
7:
8: unsigned long Hello::Sum()
9: {
10: return thrust::reduce(m_device_data.cbegin(), m_device_data.cend(),
11: 0, thrust::plus<unsigned long>());
12: }
13:
14: unsigned long Hello::Max()
15: {
16: return *thrust::max_element(m_device_data.cbegin(), m_device_data.cend(),
17: thrust::less<unsigned long>());
18: }
Finally HelloWorld.cpp contains the application’s entry point and executes the CUDA/Thrust code. It also calculates the answers on the host’s CPU so that you can check for correctness.
1: #include "stdafx.h"
2: #include <iostream>
3: #include <algorithm>
4: #include <vector>
5: #include "Hello.h"
6:
7: using namespace ::std;
8:
9: int _tmain(int argc, _TCHAR* argv[])
10: {
11: cout << "Generating data..." << endl;
12: thrust::host_vector<unsigned long> host_data(100000);
13: thrust::generate(host_data.begin(), host_data.end(), rand);
14: cout << "generated " << host_data.size() << " numbers" << endl;
15:
16: cout << endl << "Running host code..." << endl;
17: unsigned long host_result = thrust::reduce(host_data.cbegin(), host_data.cend(),
18: 0, thrust::plus<unsigned long>());
19: cout << "The sum is: " << host_result << endl;
20: host_result = *thrust::max_element(host_data.cbegin(), host_data.cend(),
21: thrust::less<unsigned long>());
22:
23: cout << "The max is: " << host_result << endl;
24:
25: cout << endl << "Copying data to device..." << endl;
26: Hello hello(host_data);
27:
28: cout << endl << "Running CUDA device code..." << endl;
29: unsigned long device_result = hello.Sum();
30: cout << "The sum is: " << device_result << endl;
31:
32: cout << endl << "Running CUDA device code..." << endl;
33: device_result = hello.Max();
34: cout << "The max is: " << device_result << endl;
35:
36: return 0;
37: }
Run the application and you should see the following output:
You may see lots of warnings Resolving Thrust/CUDA warnings “Cannot tell what pointer points to…”. This appears to be a know issue. They only appear when the NVCC compiler’s –G0 flag is set and/or the project is compiling against arch sm_10.
Making use of the Parallel Patterns Library and C++ lambdas
So now we have a Win32 application that runs CUDA code using the Thrust template library. We could have done this with a single project that targeted the v90 toolset. Update the HelloWorld.cpp file to use the parallel_invoke algorithm to run the host and device code in parallel.
1: #include "stdafx.h"
2: #include <iostream>
3: #include <algorithm>
4: #include <vector>
5: #include "ppl.h"
6:
7: #include "Hello.h"
8:
9: using namespace ::std;
10: using namespace ::Concurrency;
11:
12: int _tmain(int argc, _TCHAR* argv[])
13: {
14: cout << "Generating data..." << endl;
15: thrust::host_vector<unsigned long> host_data(100000);
16: thrust::generate(host_data.begin(), host_data.end(), rand);
17: cout << "generated " << host_data.size() << " numbers" << endl;
18:
19: parallel_invoke(
20: [host_data]()
21: {
22: cout << endl << "Running host code..." << endl;
23: unsigned long host_result = thrust::reduce(host_data.cbegin(),
24: host_data.cend(), 0, thrust::plus<unsigned long>());
25: cout << "The sum is: " << host_result << endl;
26: host_result = *thrust::max_element(host_data.cbegin(),
27: host_data.cend(), thrust::less<unsigned long>());
28: cout << "The max is: " << host_result << endl;
29: },
30: [host_data]()
31: {
32: cout << endl << "Copying data to device..." << endl;
33: Hello hello(host_data);
34:
35: cout << endl << "Running CUDA device code..." << endl;
36: unsigned long device_result = hello.Sum();
37: cout << "The sum is: " << device_result << endl;
38:
39: cout << endl << "Running CUDA device code..." << endl;
40: device_result = hello.Max();
41: cout << "The max is: " << device_result << endl;
42: }
43: );
44: return 0;
45: }
Notice how the output ordering has changed. The call to parallel_invoke takes to lambda expressions containing code that is now run in parallel.
The complete code for this sample is available on here.
Other resources
How to create/upgrade a CUDA project in VS2008 and VS2010 to work with Nsight 1.5 and CUDA 3.2 (NVIDIA forum post)
Thrust (Project homepage on Google Code)
Lambda expressions in C++ Visual Studio 2010
22 Responses to “Using CUDA and Thrust with Visual Studio 2010”
Hello, I was wondering if you have heard if there is any plans of CUDA being able to work under VS2010 in the near future (I mean, compile with 2010 compiler and no needing the 2008).
Thanks!
By Michel on Mar 14, 2011
Michel,
I’m not aware of what NVIDIA’s plans are here. I spoke to a couple of people from NVIDIA at GTC and they understand that it’s an issue for Windows developers.
Ade
By Ade Miller on Mar 14, 2011
Thanks Ade!
By Michel on Mar 14, 2011
Hi Ade,
Thanks a lot of the detailed explanation; things such as these are sorely lacking, in my opinion. Your example worked great, but I would like to use a static library instead of a DLL. I havnt managed to make that compile so far though. If you could give me any pointers, it would be much appreciated!
Regards,
Eelco Hoogendoorn
By Eelco on Mar 18, 2011
To be more specific: If I change the CUDA DLL project to a LIB project, and instead of setting the ‘Additional Libraries Directories’ and ‘Additional Dependencies’ at the linker, set the corresponding fields at the librarian, things dont work out.
Rebuilding just the CUDA project succeeds without errors; which is nonsense: the .LIB never gets created, as is revealed upon a build of the whole solution.
Any ideas?
Regards,
Eelco
By Eelco on Mar 18, 2011
Thank you so much for this work!! I was lacking this severely for my research.
By Emre Turkoz on Mar 24, 2011
Hi Ade,
First of all: Thanks a lot for your efforts to give a public guide to setup the VS2010 environment for CUDA. It was already very helpful.
As a complete VS2010 newbie stumbling from unexperienced Linux CUDA development into NSight I tried your small guide to get me started setting up the environment.
Everything works finde (YEAHAW)… but I have some small probs to get things done the way I need them. The main problem is that I’m an unexperienced programmer.
1. I’m not used to dlls.
2. I’m not used to compiling x64 projects.
Nevertheless I need to compile x64 code as I have high memory requirements in my apps. Therefore I tried to modify your walkthrough to change everything to “x64”. But doing so results in a set of linker errors and I cannot find the issue (most likely because I didn’t understand some basics somewhere).
So Question 1 is: How to compile for this example project for x64?
The second one is: I have no clue about .dlls and thrust also don’t feel very comfortable with all these declspec and pragma stuff I don’t understand.
So what I need (and most likely also many others that are not at your level of experience allowing you to make up all the settings on your own) is a “keep it simple” example. Plain Cuda. No dlls (if possible?), set up both x64 and win32 for debug and for optimized release configs. This would be the reasonable starting point for most beginners. Any chance to get you helping us by dangling fame?
cheers, markus
By Markus on Mar 25, 2011
Hi Markus,
I’m on the road at the moment but it looks like this should be my next project. I can certainly help with the x64 configuration (most of my projects compile this way). Using a LIB rather than a DLL may be more difficult. I talked to a couple of people here about static linking and the PPL and then may be a fe gotchas.
Watch this space, maybe I’ll get to it this weekend.
Ade
By Ade Miller on Mar 30, 2011
Markus,
There’s a new post that show you how to do x64 builds Using CUDA and Thrust with VS 2010 Part 2: x64 Builds
Ade
By Ade Miller on Apr 9, 2011
Markus,
Here’s another blog post that covers the simpler scenario of a single project. This may be helpful for you.
http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in-visual-studio-2010/
The reason I use two projects is because I want to use the new C++0x features and the Parallel Patterns Library in my application. You cannot do this with a single project because the CUDA project needs to use the VS 2008 C++ compiler. I’ll have a look at supporting libs.
Ade
By Ade Miller on Apr 9, 2011
Just thought I should mention that CUDA 4 RC2 supports the V100 compiler and I was able to successfully create x64/win32 CUDA 4 projects under VS2010 using your two guides. You no longer have to create two separate projects and/or designate the CUDA project as a DLL.
By Carlos on Apr 14, 2011
Carlos,
Excellent news about RC2. I’ll be upgrading at some point and writing a better, hopefully shorter blog post about hwo to do this.
Ade
By Ade Miller on Apr 22, 2011
that was a really really usefull article , this was the best quid i found in the net. thank you.
By nafiseh on Jul 11, 2011
Dear Sir, your example is very good to take start with CUDA n Visual studio but i am facing some problems..as you said to enter “cudart.lib” in the input of the linker tab..but after building it gives error that
LINK : fatal error LNK1104: cannot open file ‘cudart.lib’
please guide me in this regard.
Thanks
By hamid on Oct 4, 2011
cudart.lib needs to be on your library search path. Do you have the CUDA SDK installed? Are the environment variables CUDA_PATH_V4_0 set?
Ade
By Ade Miller on Nov 5, 2011