Reuse __device__ code

A few days ago I started writing a header file containing operator overloads as device/host functions for the float4 vector type because CUDA doesn’t include any operators on them by default. So I wrote some code like this:

__host__ __device__
float4 operator+(const float4 &a, const float4 &b)
{
 return make_float4(a.x + b.x, a.y + b.y, a.z + b.z,  a.w + b.w);
}

Actually this code worked perfectly fine and of course I liked to reuse this code and included the header file in different .cu files to provide the operator overloads to the different kernels. In addition I wanted to use the host version in my unit tests to verify the implementation is correct. Unfortunately this didn’t work. I got the classical linking error saying something like: “Multiple definitions of operator+(flaot4, float4)”

After searching the web and many hours later I found the solution to my problem. You have to request the compiler to inline these functions and you can do this by simply prefix the function with the keyword inline.

So the new version looks pretty much the same and everything works fine:

inline __host__ __device__
float4 operator+(const float4 &a, const float4 &b)
{
 return make_float4(a.x + b.x, a.y + b.y, a.z + b.z,  a.w + b.w);
}

Now you can include the header file everywhere you need the operator overloads, reuse the device code and everything works smooth and fine.

Leave the first comment

CUDA by Example

Finnaly, after reading the first CUDA book “Programming Massively Parallel Processors” (see here) I finished the review on this book and well I have to say, I definitely read them in the wrong order.

This book gives you a very good introduction in the most important concepts of CUDA. Every Chapter covers one or two concepts from the CUDA API such as different Memory Types, Events, Graphics Interoperability with OpenGL or Streams.
As an introduction this gives you a very good overview on the most important things provided by the API, and you get a firm understanding on what you have to write on your own and what you can use as predefined.

Usually each topic is introduced by showing a portion of code that solves a particular problem without actually using the dedicated concept that is being introduced. Then this code is transformed step by step into the new solution by introducing the new concept and showing how it is done better. I liked this kind of learning CUDA because after studying this examples you know when you should use which concept.

The important thing to say on this book is that you really get an introduction on the API functions and features. When you are reading the book mentioned on the top (“Programming Massively Parallel Processors”) you get more the general concept of the hardware and the language itself.

Leave the first comment

Set up Eclipse CDT with CMake and CUDA

Today I tried to set up Eclipse with CUDA and CMake integration. So I started writing the CMakeLists files and looked for ways to integrate the whole process into my IDE build.

And here is what I have done:

Set up CMake and CMakeLists files

Fist set up your cmake installation. I assume you have already downloaded the latest version of cmake and are ready to write your CMakeLists files.

My CMakeLists in the source root directory is actually quite simple. The most important part is basically to find CUDA. This is done by the FindCuda.cmake module which you can get here (the latest CMake build already includes FindCuda.cmake). My code in the CMakeLists file then looks like this:

# cmake compatibility issues
CMAKE_MINIMUM_REQUIRED(VERSION 2.6)

# project name
PROJECT(CUDASandbox)

# version number
SET(VERSION_MAJOR 1)
SET(VERSION_MINOR 0)
SET(VERSION ${VERSION_MAJOR}.${VERSION_MINOR})

# project options
OPTION(INCLUDE_CUDA "Set to OFF to no search for CUDA" ON)

# set project build type
SET(CMAKE_BUILD_TYPE Debug)

# find project dependencies

#find cuda
IF(INCLUDE_CUDA)
    FIND_PACKAGE(CUDA)

    IF(CUDA_FOUND)
        MESSAGE("CUDA has been found")
    ELSE(CUDA_FOUND)
        MESSAGE(FATAL_ERROR "CUDA could not be found")
    ENDIF(CUDA_FOUND)

ENDIF(INCLUDE_CUDA)

# add subdirectories
ADD_SUBDIRECTORY(src)

Set up Eclipse

  1. Make a new Project
    C++ Projekt -> Makefile project -> Empty Project -> MacOSX GCC
  2. Make a build folder for out-of-source builds
    File-> New-> Folder -> Folder name: build
  3. Adjust the project properties
    Project -> Properties -> C/C++ Build
    - set the Build location to the created build folder
    - uncheck Generate Makefiles automatically
    - click OK to finish
  4. Edit your make targets
    On the right side of the editor window there should be a tab called Make Targets. Click this tab and select your project, then right click and select
    New…
    - set the target name to cmake
    - uncheck Same as the target name and Use builder settings
    - delete the text in the field Make target
    - write in Build command cmake -D CMAKE_BUILD_TYPE=debug -D CMAKE_CXX_FLAGS=”-g -Wall” ..
  5. Now you can start coding. Before your first build and after every change on the CMakeLists files double click the cmake target created in step 4. Afterwards compile your project as usual.

As you can see from the CMake file above I used a “src” directory in my project structure to hold my CUDA source files. So to finish the Eclipse and CMake setup for the project add a “src” folder to your project and place inside that folder in addition to the source files another CMakeLists file that looks as follows:

# src directory

# include directories
INCLUDE_DIRECTORIES(.)

# collect source files
FILE(GLOB source "*.cu")

# build and link
SET(EXE_NAME ${PROJECT_NAME})

CUDA_ADD_EXECUTABLE(${EXE_NAME} ${source})

Now your the basic installation should be finished and ready to be used.
Don’t forget: whenever you changed something on your CMakeLists files or if you added new files double click the cmake target on the right side and hit the compile button to build. Of course this is just a very simple CMake setup but the aim of this post is just to show the basic setup of the Eclipse and CMake properties needed in a CUDA build process.

Leave the first comment

Programming Massively Parallel Processors

Although there exist CUDA communities and one can find lots of information on CUDA on the web to learn it is still not the same for me as learning a topic by reading a book.  Therefore I got some books on CUDA and I would like to write short reviews on their contents by the time when I have read them.

I recently finished reading the book “Programming Massively Parallel Processors: A Hands-on Approach”. Initially I thought this book would be a good introduction to learn how to program CUDA but today I have to say it is not really an introduction. In my opinion the book has a nice start but then gets very fast into details on the GPU architecture and how to tune your program to high performance. I agree, in the end this is all CUDA is about but as an introduction I prefer to first get a global overview and then go further into details. In addition the book does never really says something about any CUDA API functions or what the API actually provides. I missed this stuff all along the reading.

By now it seems like I criticized almost everything written in this book. But I am sure after you have worked out the basic stuff the book has also positive sides. The chapters about memory, performance and floating point considerations will certainly help you when you try to get last bits of performance out of your code. After ups and downs during the reading I was satisfied by the last chapter in the book called “conclusion and future outlook”. It gives you a nice overview of the current state of CUDA and what is and what is not possible when programming CUDA. Even though it is a nice good finish I thought: “Why couldn’t you tell me earlier in the book about all that stuff?”

A few days ago I started reading the new book on CUDA called “CUDA by Example” and by now it think the book as an introduction is way beyond this book but I will report on that as soon as I finished reading it.

Leave the first comment

Install CUDA on Mac OS X

Today I tried to install the CUDA tools on my MBP ( MacBookPro6,2, with i7 and NVIDIA GeForce GT 330M). The installation according to the NVIDIA “Getting Started Guide Mac” worked perfectly. All of the samples compiled but no all of them were running. When I tried to run the “deviceQuery” binary I got an error:

cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.
I guess the problem was the automatic graphics switching in the new MBP series so when running the program the built-in Intel graphics chip was active which obviously could not do the job.
There is a nice little tool called gfxCardStatus available that fixed my problem. With this tool you can force your Mac to use either NVIDIA or Intel graphics only. After forcing my Mac to use NVIDIA graphics the deviceQuery program worked perfectly.
One comment so far, add another