Tagged Questions

info newest frequent votes active unanswered

CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.

learn more… | top users | synonyms

votes

2answers

2k views

Generating prime numbers using Sieve of Eratosthenes with CUDA

I'm learning CUDA and wrote a little program which generates prime numbers using the Sieve of Eratosthenes. (I know the limitations of CUDA, specially with memory sizes and limits, but this program is ...

asked Jul 11 '14 at 1:02

Unglued

787

votes

1answer

248 views

C and CUDA: circular buffer implementation

I have a programme which uses many circular buffers in an identical fashion on a CPU and GPU (C and C/C++ CUDA). I essentially require many queues, however, due to this being run on a GPU, I have ...

c circular-list cuda

asked Dec 17 '15 at 8:57

James

935

votes

1answer

295 views

3D vector CUDA kernel

I designed this CUDA kernel to compute a function on a 3D domain: p and Ap are 3D vectors that are actually implemented as a ...

c++ performance computational-geometry cuda

asked Jun 6 '14 at 14:46

repptilia

1363

votes

2answers

437 views

Convert a 24bit bitmap to grayscale

I wrote this so I can learn CUDA. This is coded to work on my laptop's Nvidia GeForce GT 540M. Main points I need reviewed: CUDA programming conventions Performance, especially kernel speed C ...

performance c image converting cuda

asked Mar 26 '15 at 17:59

JaDogg

1,7872843

votes

1answer

255 views

Implementation of AES using CUDA

I am trying to implement AES on GPU using CUDA programming. I use 4 TBoxes in my implementation that requires 4kB of GPU Memory. I have used a 1KB array for 1KB plaintext. first all of plaintext would ...

c++ performance aes cuda

asked Nov 17 '15 at 20:52

m.r226

261

votes

1answer

154 views

CUDA Kernel - Neural Net

I'm building a spiking neural net (recurrent, integrate and fire), and I'm curious about how to reduce the warp divergence (and other problems) I may have. Here's an example with a few hand-placed ...

c++ neural-network cuda

asked Jul 5 '15 at 14:40

Hyllis

265

votes

2answers

298 views

Calculating neurons and derivatives

This function runs very often. cudaMemcpy is at the start and works very slowly. How can I change this function to avoid this? I already have ...

c++ performance memory-management cuda

asked Nov 16 '12 at 9:31

Robotex

1234

votes

2answers

230 views

Calculating sum of primes using the CPU and GPU

This is a little baffling to me as to why the CUDA code runs about twice as slow as the CPU version. I am just counting all the primes from 0 to (512 * 512 * 512). The CPU version executed in about 97 ...

c++ performance primes cuda

asked Aug 9 '15 at 2:42

chasep255

1603

votes

1answer

47 views

A “policy-based” design for a generic CUDA kernel

I am faced with a design issue that has been discussed on SO several times, the most similar question being this one. Essentially, I want polymorphism for a CUDA kernel in the form of a "generic" ...

c++ oop template cuda

asked Apr 1 at 19:37

icurays1

1212

votes

0answers

53 views

Parallel reduction by key implementations

I have an implementation of the reduction approach used in this document. Furthermore, I extended (crudely) this so I can reduce-by-key. In my setup I can assume that a ...

performance c cuda

asked Dec 30 '15 at 5:46

James

935

votes

0answers

58 views

Calculating the distance between several spatial points

I am developing a CUDA program and I want to enhance my performance. I have a kernel function which is consuming more than 70% of execution time. The kernel calculates the distance between several ...

c++ performance coordinate-system cuda

asked Jan 6 at 18:22

Siamak

112

vote

1answer

60 views

CUDA brute force 48 bit key

I have a cryptographic function with two 24 bit keys. I have two blocks of input and two blocks of output, and want to brute force the keys using CUDA. Overview: The function is composed to two ...

c++ beginner time-limit-exceeded cryptography cuda

asked May 27 at 18:19

robertkin

303

vote

0answers

69 views

Unwrapping multiple inner loops in CUDA for 4D nonlocal filter

I'm working on some sort of non-local means filtering in 4D space (x,y,z + time). The idea is to pass to GPU a chunk of large 4D array in order to process it and return a filtered 3D slice (then ...

performance image cuda

asked May 31 at 10:15

Daniel

vote

0answers

36 views

Cuda C Matrix Compression

I am using Cuda to learn and implement a CSR matrix compression algorithm. What can I do better relating to C's best practices? main.c: ...

c matrix compression cuda

asked May 26 at 20:00

Craig Swearingen

234

highest voted cuda questions feed

current community

your communities

more stack exchange communities

Tagged Questions

Generating prime numbers using Sieve of Eratosthenes with CUDA

C and CUDA: circular buffer implementation

3D vector CUDA kernel

Convert a 24bit bitmap to grayscale

Implementation of AES using CUDA

CUDA Kernel - Neural Net

Calculating neurons and derivatives

Calculating sum of primes using the CPU and GPU

A “policy-based” design for a generic CUDA kernel

Parallel reduction by key implementations

Calculating the distance between several spatial points

CUDA brute force 48 bit key

Unwrapping multiple inner loops in CUDA for 4D nonlocal filter

Cuda C Matrix Compression

Hot Network Questions

your communities

Tagged Questions

Related Tags