A few days ago I started writing a header file containing operator overloads as device/host functions for the float4 vector type because CUDA doesn’t include any operators on them by default. So I wrote some code like this:
__host__ __device__
float4 operator+(const float4 &a, const float4 &b)
{
return make_float4(a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w);
}
Actually this code worked perfectly fine and of course I liked to reuse this code and included the header file in different .cu files to provide the operator overloads to the different kernels. In addition I wanted to use the host version in my unit tests to verify the implementation is correct. Unfortunately this didn’t work. I got the classical linking error saying something like: “Multiple definitions of operator+(flaot4, float4)”
After searching the web and many hours later I found the solution to my problem. You have to request the compiler to inline these functions and you can do this by simply prefix the function with the keyword inline.
So the new version looks pretty much the same and everything works fine:
inline __host__ __device__
float4 operator+(const float4 &a, const float4 &b)
{
return make_float4(a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w);
}
Now you can include the header file everywhere you need the operator overloads, reuse the device code and everything works smooth and fine.
Leave the first comment ▶
Initially I thought this book would be a good introduction to learn how to program CUDA but today I have to say it is not really an introduction. In my opinion the book has a nice start but then gets very fast into details on the GPU architecture and how to tune your program to high performance. I agree, in the end this is all CUDA is about but as an introduction I prefer to first get a global overview and then go further into details. In addition the book does never really says something about any CUDA API functions or what the API actually provides. I missed this stuff all along the reading.
RSS Feed