CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.

learn more… | top users | synonyms (2)

0
votes
0answers
5 views

How can I run and test NVENC API working on Linux CentOS?

We have a server with kepler graphics card and Nvidia driver already installed. How can I run NVENC ( Hardware for Video encoding ) and use its SDK on linux CentOS 6.4 How can I test it that is it ...
0
votes
1answer
17 views

Using libraries like boost in cuda device code

I am learning cuda at the moment and I am wondering if it is possible to use functions from different libraries and api's like boost in cuda device code. Note: I tried using std::cout and that did ...
-1
votes
0answers
11 views

Constructing uniform grid on the GPU

I'm working on a simple ray tracer using CUDA and I've come across this paper detailing an algorithm to construct an uniform grid on the GPU. I've done half of it, but now I'm stuck, and there's no ...
1
vote
1answer
10 views

About the variable definition in cuda

I have to load the data from a file. Each sample is 20-dimensional. So I used this data structure to help me with this: class DataType { vector<float> d; } But while I use this variable ...
1
vote
1answer
49 views

C++ data structures and CUDA

i have a structure which can be struct type1{ double a,b,c;} or it can be struct type2{ double a,b,c,d,e;} in my host function of cuda code i have someting like void compute(){ // ...
0
votes
0answers
25 views

CUDA kernel launch failure in find min in an array using reduction

This is part of an online parallel programming course. I am using the below CUDA code to find the minimum in an image. The code is getting executed on the Udacity server. Since I do not have a GPU I ...
0
votes
1answer
22 views

Correct way to use __constant__ memory on CUDA?

I have an array I would like to initialize in __constant__ memory on the CUDA device. I don't know it's size or the values until runtime. I know I can use __constant__ float Points[**N**][2] or ...
1
vote
0answers
30 views

Cuda reduction in nested for loops

I have a problem concerning some kind of reduction in CUDA. distance is a matrix with gridSize*numberOfAngles elements, fftData is a matrix with numberOfAngles*NFFT elements. grid_magnitude is the ...
0
votes
0answers
17 views

cudafy.net with NSight, debugger not working…bugger

As the topic states, i cant get the debugger working. Below is the sequence of steps ive done. Note: i have Cuda 5.0 installed and NSight visual studio edition 3.0 installed. Ive heard that it is ...
0
votes
0answers
17 views

thrust copy to device memory slow due to non existing kernel launch

I have the following classes: class host_list{ host_vector<int> id; host_vector<int> weight; /*...irrelevant functions and variables...*/ host_list& operator= (const ...
1
vote
2answers
34 views

How to pass the address of a template kernel function to a CUDA function?

I want to use CUDA runtime API functions accepting CUDA kernel function pointers with kernel templates. I am able to do the following without templates: __global__ myKernel() { ... } void ...
-1
votes
1answer
24 views

where is nvidia gpu computing sdk on Mac [on hold]

My Mac is running Lion and Xcode 4.6.3. I just updated Xcode today from 4.3. I remember the SDK used to be in /Developer, however, that folder no longer exists. So where can I find NVIDIA GPU ...
0
votes
1answer
29 views

About Open MP and cudaSetDevice()

Anyone know if the following usage of cudaSetDevice is correct (I want to repeatedly calling resources created on different devices at any time, in any host threads, is there a way to do this in ...
0
votes
1answer
29 views

CUDA 5.0 Replay Overhead

I am a novice CUDA programmer. I recently learned more about achieving better performance at lower occupancy. Here is a code snippet, I need help for understanding a few thing about replay overhead ...
1
vote
1answer
25 views

What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

I've recently gotten my head around how NVCC compiles CUDA device code for different compute architectures. From my understanding, when using NVCC's -gencode option, "arch" is the minimum compute ...

1 2 3 4 5 343
15 30 50 per page