CUDA constant memory, namespace, and weird bugs
Edit: the usage of cudaMemcpyToSymbol
describded below is deprecated since CUDA 4.1 (See also my new entry Upgrade to CUDA 5.0: cudaMemcpyToSymbol invalid device symbol error)
Today I want to discuss some issues I had with CUDA constant memory and share some workarounds.
Constant memory is a useful feature that enables CUDA programmers to share data within a kernel. Without constant memory one can declare and use a global variable like this:
__device__ float c_array[10] __global__ void kernel(float* d_array){ d_array[0] = c_array[0]; } void test(){ float* d_array = 0; float h_array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; cudaMalloc((void**)&d_array, 10 * sizeof(float)); cudaMemcpyToSymbol("c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice ); kernel<<< 1, 1 >>>(d_array); }
The global variable c_array is in global device memory. This type of memory is known to be the slowest memory you can access from a kernel. Instead one should prefer constant memory by putting __constant__
before c_array
:
__device__ __constant__ float c_array[10]
- Advantage: constant memory is cached up to 8KB and will speed up data fetch.
- Drawbacks: it is limited to 64KB and is read only.
I've been facing the run-time error "invalid device symbol" when doing something like:
namespace Test_name { __device__ __constant__ float c_array[10] void test(){ cudaMemcpyToSymbol("c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice ); } }
I'm sure people doing language designs would find the error trivial but for me it took some time to figure out what it was. NVIDIA documentation is not really helping here. Explanations about constant memory and its usage are quiet sparse...
So here it goes: nvcc doesn't find the c_array
symbol because it is declared within a namespace. The solution is to give the correct symbol:
cudaMemcpyToSymbol("Test_name::c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice );
A shame I had to guess this syntax because it is not described anywhere in the CUDA documentation. Or at least it is too well hidden for me to find it!
You may face weird bugs with constant memory depending on your project and GPU architectures. Here is few things you can try:
- get constant variables outside namespaces
- add or remove
__device__
symbol before your variable - put the
cudaMemcpyToSymbol()
within the same.cu
of the constant variable
Hope this will help.
No comments