Tagged Questions

info newest faq votes active unanswered

See the tag entry for "gpu".

vote

0answers

24 views

Longest common subsequence with CUDA

I've seen much ado made over computing the longest common subsequence using GPUs in academic literature, but never any actual code. Does anybody have some code (whatever the language) that implements ...

modified yesterday

Fred Milton
513

votes

1answer

59 views

Linking with 3rd party CUDA libraries slows down cudaMalloc

It is not a secret that on CUDA 4.x the first call to cudaMalloc can be ridiculously slow (which was reported several times), seemingly a bug in CUDA drivers. Recently, I noticed weird behaviour: the ...

cuda gpu gpgpu gpu-programming

modified Jul 26 at 10:36

talonmies
15.7k11333

-2

votes

0answers

26 views

Accepted Speedup for the GPU [closed]

I have implemented a string matching algorithm on the GPU using CUDA. The minimum obtained speedup is 5x, and the maximum is 21x. I cannot find what is the accepted speedup for using the GPU. It will ...

gpu gpgpu gpu-programming

modified Jul 24 at 2:18

GHAZAL RAHBARI
154

votes

1answer

29 views

OpenCL : Targeting Work-group to a specific device

Assuming that I have a multiprocessor machine. Can I bind my work-group to a specific device (processor) ? Do we have any API to accomplish this task in openCL ?

multiprocessing opencl cpu-architecture gpu-programming multiple-gpu

modified Jul 23 at 11:14

mfa
1,064118

vote

1answer

99 views

Handling Ctrl+C exception with GPU

I am working with some GPU programs (using CUDA 4.1 and C), and sometimes (rarely) I have to kill the program midway using Ctrl+C to handle some exception. Earlier I tried using CudaDeviceReset() ...

memory-leaks cuda nvidia gpu-programming

modified Jul 23 at 8:15

P Marecki
8116

votes

1answer

34 views

Installing GPUOCELOT under OSX Lion 10.7

I'm new to stackoverflow. My question ia about gpuocelet. Is there anybody using it? Does it work on unix (I'm using a macbook air with os x 10.7) OS? I tried in many way to install it but without ...

boost cuda emulator gpu-programming

modified Jul 22 at 8:47

Mark
14.7k31938

votes

2answers

74 views

Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL

I was trying to understand how exactly CL_MEM_USE_HOST_PTR and CL_MEM_COPY_HOST_PTR work. Basically when using CL_MEM_USE_HOST_PTR, say in creating a 2D image, this will copy nothing to the device, ...

opencl gpu gpu-programming

modified Jul 12 at 22:03

Nike
426

votes

4answers

2k views

CUDA kernels consistently returning bad results

I am a CUDA beginner who has successfully compiled and run several code samples using CUDA libraries such as CUFFT and CUBLAS. Lately, however, I have been trying to generate my own simple kernels ...

cuda gpu gpu-programming

modified Jul 12 at 22:02

codetwiddler
20616

votes

0answers

37 views

cuda strange register usage with while loops

I have a cuda kernel wrapped in a while loop, like so: while (i < particles) { .... } When I change this to the following: while (true) { if (i >= particles) break; .... } ...

cuda gpgpu gpu-programming

modified Jul 10 at 17:08

smackee618
11

votes

2answers

170 views

Efficient memcpy of large input in CUDA?

I have a problem with a program I'm writing using CUDA. I have an input array and an output array which I need to copy to device memory. The problem is that both arrays together are too large to fit ...

cuda memcpy gpu-programming

modified Jul 7 at 13:11

talonmies
15.7k11333

votes

4answers

166 views

GPU reads from CPU or CPU writes to the GPU?

I am beginner in parallel programming. I have a query which might be seem to be silly but I didn't get a definitive answer when I googled it out. In GPU computing there is a device i.e. the GPU and ...

cuda opencl gpu gpu-programming

modified Jul 6 at 18:14

Nike
426

vote

1answer

60 views

Read/Write OpenCL memory buffers on multiple GPU in a single context

Assume a system with two distinct GPUs, but from the same vendor so they can be accessed from a single OpenCL Platform. Given the following simplified OpenCL code: float* someRawData; cl_device_id ...

memory-management opencl gpu-programming

modified Jul 3 at 15:11

matthias
1737

votes

2answers

226 views

Why is Arrayfun much faster than a for-loop when using GPU?

Could someone tell why Arrayfun is much faster than a for loop on GPU? (not on CPU, actually a For loop is faster on CPU) Arrayfun: x = parallel.gpu.GPUArray(rand(512,512,64)); count = arrayfun(@(x) ...

matlab gpu-programming

modified Jul 3 at 5:44

David Cowden
79811

votes

0answers

36 views

OpenCL Arithmetic [closed]

I have got the task to optimize the speed of an engine. Core part of this system is a Java Class which does computation on fraction part of the numbers in huge amount. Engine becomes slow when ...

opencl gpu-programming

modified Jul 2 at 9:35

talonmies
15.7k11333

votes

1answer

112 views

What happened when alll thread of a warp read the same global memory?

I want to know what happened when all threads of a warp read the same 32-bit address of global memory. How many memory requests are there? Is there any serialization. The GPU is Fermi card, the ...

cuda gpu gpgpu gpu-programming

modified Jun 24 at 6:16

Krazy Glew
464111

15 30 50 per page

recently active gpu-programming questions feed

169

questions tagged

gpu-programming about »

Community Bulletin

cuda× 82
gpu× 71
gpgpu× 45
opencl× 43
c++× 21
nvidia× 18
c× 11
parallel-processing× 10
matlab× 9
cuda-kernel× 7
opengl× 6
opencv× 4
c#× 4
algorithm× 4
.net× 3
osx× 3
visual-studio× 3
debugging× 3
image-processing× 3
c++-amp× 3
multithreading× 3
performance× 3
python× 2
fft× 2
string× 2

Tagged Questions

Community Bulletin

Related Tags