Frequent 'cuda parallel-processing' Questions

19

votes

2answers

3k views

How do I make an already written concurrent program run on a GPU array?

I have a neural network written in Erlang, and I just bought a GeForce GTX 260 card with a 240 core GPU on it. Is it trivial to use CUDA as glue to run this on the graphics card?

asked Oct 17 '08 at 18:40

memius
1,20111017

2

votes

1answer

722 views

For nested loops with CUDA

I'm having a problem with some for nested loops that I have to convert from C/C++ into CUDA. Basically I have 4 for nested loops which are sharing the same array and making bit shift operations. ...

c++ c for-loop cuda parallel-processing

asked Mar 29 '12 at 8:39

Dado
37519

4

votes

2answers

2k views

How to measure the execution time of every block when using CUDA?

clock() is not accurate enough.

cuda gpu parallel-processing

asked Aug 24 '10 at 5:50

cnhk
365412

2

votes

1answer

267 views

Does early exiting a thread disrupt synchronization among CUDA threads in a block?

I am implementing a certain image processing algorithm with CUDA and I have some questions about the thread synchronization issue overall. The problem at hand can be explained like that: We have an ...

multithreading parallel-processing synchronization cuda

asked Jul 25 '12 at 16:16

Ufuk Can Biçici
1149

1

vote

1answer

568 views

Shared memory matrix multiplication kernel

I am attempting to implement a shared memory based matrix multiplication kernel as outlined in the CUDA C Programming Guide. The following is the kernel: __global__ void matrixMultiplyShared(float * ...

c cuda parallel-processing gpu shared-memory

asked Dec 28 '12 at 23:47

Abraham P
1,164326

0

votes

1answer

246 views

Dynamic programming in CUDA: global memory allocations to exchange data with child kernels

I have a the following code: __global__ void interpolation(const double2* __restrict__ data, double2* __restrict__ result, const double* __restrict__ x, const double* __restrict__ y, const int N1, ...

cuda parallel-processing gpgpu

asked Feb 13 at 14:07

JackOLantern
859110

115

votes

16answers

7k views

Why aren't we programming on the GPU? [closed]

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 ...

performance parallel-processing cuda gpgpu gpu-programming

asked Apr 3 '10 at 0:06

Chris
2,24021327

16

votes

16answers

3k views

What future does the GPU have in computing? [closed]

Your CPU may be a quad-core, but did you know that some graphics cards today have over 200 cores? We've already seen what GPU's in today's graphics cards can do when it comes to graphics. Now they ...

parallel-processing cuda gpu opencl

community wiki

6 revs
Steve Wortham

9

votes

2answers

1k views

Python Multiprocessing with PyCUDA

I've got a problem that I want to split across multiple CUDA devices, but I suspect my current system architecture is holding me back; What I've set up is a GPU class, with functions that perform ...

python cuda parallel-processing multiprocessing pycuda

asked May 5 '11 at 22:33

Bolster
1,75421044

5

votes

3answers

2k views

help me understand cuda

i am having some troubles understanding threads in NVIDIA gpu architecture with cuda. please could anybody clarify these info: an 8800 gpu has 16 SMs with 8 SPs each. so we have 128 SPs. i was ...

cuda gpu parallel-processing

asked Feb 5 '10 at 12:37

scatman
262

5

votes

2answers

564 views

What is the cheapest way to build an Erlang server farm (for a hobby project)? [closed]

Let's say we have an 'intrinsically parallel' problem to solve with our Erlang software. We have a lot of parallel processes and each of them executes sequential code (not number crunching) and the ...

erlang cuda parallel-processing

asked Sep 27 '11 at 11:49

Martin Lee
944319

4

votes

3answers

370 views

CUDA, NPP Filters

The CUDA NPP library supports filtering of image using the nppiFilter_8u_C1R command but keep getting errors. I have no problem getting the boxFilterNPP sample code up and running. eStatusNPP = ...

c++ image-processing cuda parallel-processing convolution

asked Oct 8 '12 at 9:04

Steenstrup
655

3

votes

3answers

1k views

CUDA - Implementing Device Hash Map?

Does anyone have any experience implementing a hash map on a CUDA Device? Specifically, I'm wondering how one might go about allocating memory on the Device and copying the result back to the Host, ...

cuda parallel-processing hashmap

asked Apr 3 '11 at 22:57

nedblorf
6222719

2

votes

1answer

181 views

vector step addition slower on cuda

I am trying to run the vector step addition function on CUDA C++ code, but for large float arrays of size 5,000,000 too, it runs slower than my CPU version. Below is the relevant CUDA and cpu code ...

c++ cuda parallel-processing gpu gpgpu

asked Mar 4 at 4:54

assassin
1,03941222

1

vote

1answer

1k views

Realistic deadlock example in CUDA/OpenCL

For a tutorial I'm writing, I'm looking for a "realistic" and simple example of a deadlock caused by ignorance of SIMT / SIMD. I came up with this snippet, which seems to be a good example. Any ...

synchronization cuda parallel-processing opencl simd

asked Jun 21 '11 at 14:18

Framester
2,46822369

Tagged Questions

Related Tags