I’m really hoping this is the last tutorial. It’s gotten a lot simpler to build CUDA on Windows in the last couple of releases.
Update June 19th 2011: I’ve updated the code and text for the RTM release of the CUDA 4.0 toolkit.
Make sure you have the right stack.
Note that you don’t need Visual Studio 2008 any more. The dependency on the VC 9.0 compiler has been removed. This makes things much easier.
Create the project
- Create a Win32 Console Application called CudaHelloWorld. On the application settings page in the wizard check “Empty project” in the additional options menu. For the other settings use the defaults.
- Add a C++ class file called “HelloWorld.cpp”.
- Add another C++ file called “Hello.cu and a header file, “Hello.h”.
Due to the improvements in the RC2 that’s going to be all the files you need. No more linking separate DLLs and having to install the Visual Studio 2008 compiler. Your project should now look like this:
To make things easy we’re going to configure the project to support x64 now. This will save time in configuring settings later.
- Select the Build | Configuration Manager menu item.
- In the dialog select the Platform dropdown in the CudaHelloWorld and pick “<New…>”.
- In the new platform dialog create a new x64 platform and copy the settings from the existing Win32 project.
Now you have a project that targets both Win32 (x86) and x64.
Now configure the project to compile .cu files with the CUDA compiler.
Select the project in the solution explorer and then select the Project | Build Customizations… menu. In the dialog check the CUDA 4.0 targets.
Now right click on the Hello.cu file and select Properties. Make sure that the Configuration dropdown is set to “All Configurations” and the platform is set to “All Platforms” to make sure the settings get applied to all builds. In the Item Type field select “CUDA C/C++”
Now make sure that the x64 platform sets the NVCC CUDA compiler to also target x64.
- Right click on the CudaHelloWorld project and select Properties.
- Open the Configuration Properties | CUDA C/C++ tree item.
- Select “All Configurations” on the configuration dropdown and “x64” for the platform dropdown.
- Select “64-bit (–machine 64)” for the Target Machine Platform.
Now set the platform dropdown to “All Platforms” and configure the linker options
Open the Configuration Properties | Linker | Input tree item and add cudart.lib to the list in the Additional Dependencies field.
This will link the CUDA runtime library. For simplicity we’re linking the release libraries of both release and debug builds.
Add the code
Create the CUDA code. First declare the Hello class in Hello.h:
Add the corresponding definition in the Hello.cu file:
This shows how to use Thrust to calculate the sum and maximum of an array of numbers. The Hello constructor copies a vector of numbers stored on the host into a vector on the device (GPU). The Sum and Max methods call the Thrust library to execute these calculations on the GPU.
You need to add the code to call this in the HelloWorld.cpp file:
The output shows the same calculation executed on both the host CPU and the GPU on different threads. It uses the Parallel Patterns Library to create threads and then executes CUDA code from one of them. The final output looks like this:
The following no longer applies for the RTM release. Nvidia have fixed this. The code above will still work but using the C++ std::cout also works fine now. I’ve updated the sample code in the link below to use this.
This code differs from code I showed previously as it uses printf, rather than the STL streams libraries. There’s a bug in the RC2 CUDA release that will give you a lot of linking errors. Looks like this will be fixed for RTM. I’ll update the same for RTM.
You can download the full sample code from: