CUDA constant memory, namespace, and weird bugs

- 09/2012 - #Jumble, Misc

Edit: the usage of cudaMemcpyToSymbol describded below is deprecated since CUDA 4.1 (See also my new entry Upgrade to CUDA 5.0: cudaMemcpyToSymbol invalid device symbol error)

Today I want to discuss some issues I had with CUDA constant memory and share some workarounds.

Constant memory is a useful feature that enables CUDA programmers to share data within a kernel. Without constant memory one can declare and use a global variable like this:

__device__ float c_array[10]

__global__ void kernel(float* d_array){ d_array[0] = c_array[0]; }

void test(){
    float* d_array = 0;
    float h_array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};    
    cudaMalloc((void**)&amp;d_array, 10 * sizeof(float));
    cudaMemcpyToSymbol("c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice );
    kernel&lt;&lt;&lt; 1, 1 &gt;&gt;&gt;(d_array);
}

The global variable c_array is in global device memory. This type of memory is known to be the slowest memory you can access from a kernel. Instead one should prefer constant memory by putting __constant__ before c_array:

__device__ __constant__ float c_array[10]

Advantage: constant memory is cached up to 8KB and will speed up data fetch.
Drawbacks: it is limited to 64KB and is read only.

[GPU specs]

I've been facing the run-time error "invalid device symbol" when doing something like:

namespace Test_name {
__device__ __constant__ float c_array[10]

void test(){    
    cudaMemcpyToSymbol("c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice );
}
}

I'm sure people doing language designs would find the error trivial but for me it took some time to figure out what it was. NVIDIA documentation is not really helping here. Explanations about constant memory and its usage are quiet sparse...

So here it goes: nvcc doesn't find the c_array symbol because it is declared within a namespace. The solution is to give the correct symbol:

cudaMemcpyToSymbol("Test_name::c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice );

A shame I had to guess this syntax because it is not described anywhere in the CUDA documentation. Or at least it is too well hidden for me to find it!

You may face weird bugs with constant memory depending on your project and GPU architectures. Here is few things you can try:

get constant variables outside namespaces
add or remove __device__ symbol before your variable
put the cudaMemcpyToSymbol() within the same .cu of the constant variable

Hope this will help.

Donate

No comments

Rodolphe Vaillant's homepage

Research, teaching and more...

CUDA constant memory, namespace, and weird bugs