CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.
0
votes
0answers
5 views
How can I run and test NVENC API working on Linux CentOS?
We have a server with kepler graphics card and Nvidia driver already installed.
How can I run NVENC ( Hardware for Video encoding ) and use its SDK on linux CentOS 6.4
How can I test it that is it ...
0
votes
1answer
17 views
Using libraries like boost in cuda device code
I am learning cuda at the moment and I am wondering if it is possible to use functions from different libraries and api's like boost in cuda device code.
Note: I tried using std::cout and that did ...
-1
votes
0answers
11 views
Constructing uniform grid on the GPU
I'm working on a simple ray tracer using CUDA and I've come across this paper detailing an algorithm to construct an uniform grid on the GPU. I've done half of it, but now I'm stuck, and there's no ...
1
vote
1answer
10 views
About the variable definition in cuda
I have to load the data from a file.
Each sample is 20-dimensional.
So I used this data structure to help me with this:
class DataType
{
vector<float> d;
}
But while I use this variable ...
1
vote
1answer
49 views
C++ data structures and CUDA
i have a structure which can be
struct type1{ double a,b,c;}
or it can be
struct type2{ double a,b,c,d,e;}
in my host function of cuda code i have someting like
void compute(){
// ...
0
votes
0answers
25 views
CUDA kernel launch failure in find min in an array using reduction
This is part of an online parallel programming course. I am using the below CUDA code to find the minimum in an image. The code is getting executed on the Udacity server. Since I do not have a GPU I ...
0
votes
1answer
22 views
Correct way to use __constant__ memory on CUDA?
I have an array I would like to initialize in __constant__ memory on the CUDA device. I don't know it's size or the values until runtime.
I know I can use __constant__ float Points[**N**][2] or ...
1
vote
0answers
30 views
Cuda reduction in nested for loops
I have a problem concerning some kind of reduction in CUDA.
distance is a matrix with gridSize*numberOfAngles elements, fftData is a matrix with numberOfAngles*NFFT elements. grid_magnitude is the ...
0
votes
0answers
17 views
cudafy.net with NSight, debugger not working…bugger
As the topic states, i cant get the debugger working. Below is the sequence of steps ive done.
Note: i have Cuda 5.0 installed and NSight visual studio edition 3.0 installed. Ive heard that it is ...
0
votes
0answers
17 views
thrust copy to device memory slow due to non existing kernel launch
I have the following classes:
class host_list{
host_vector<int> id;
host_vector<int> weight;
/*...irrelevant functions and variables...*/
host_list& operator= (const ...
1
vote
2answers
34 views
How to pass the address of a template kernel function to a CUDA function?
I want to use CUDA runtime API functions accepting CUDA kernel function pointers with kernel templates.
I am able to do the following without templates:
__global__ myKernel()
{
...
}
void ...
-1
votes
1answer
24 views
where is nvidia gpu computing sdk on Mac [on hold]
My Mac is running Lion and Xcode 4.6.3. I just updated Xcode today from 4.3.
I remember the SDK used to be in /Developer, however, that folder no longer exists.
So where can I find NVIDIA GPU ...
0
votes
1answer
29 views
About Open MP and cudaSetDevice()
Anyone know if the following usage of cudaSetDevice is correct (I want to repeatedly calling resources created on different devices at any time, in any host threads, is there a way to do this in ...
0
votes
1answer
29 views
CUDA 5.0 Replay Overhead
I am a novice CUDA programmer. I recently learned more about achieving better performance at lower occupancy. Here is a code snippet, I need help for understanding a few thing about replay overhead ...
1
vote
1answer
25 views
What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?
I've recently gotten my head around how NVCC compiles CUDA device code for different compute architectures.
From my understanding, when using NVCC's -gencode option, "arch" is the minimum compute ...