See the tag entry for "gpu".
1
vote
0answers
24 views
Longest common subsequence with CUDA
I've seen much ado made over computing the longest common subsequence using GPUs in academic literature, but never any actual code. Does anybody have some code (whatever the language) that implements ...
2
votes
1answer
59 views
Linking with 3rd party CUDA libraries slows down cudaMalloc
It is not a secret that on CUDA 4.x the first call to cudaMalloc
can be ridiculously slow (which was reported several times), seemingly a bug in CUDA drivers.
Recently, I noticed weird behaviour: the ...
-2
votes
0answers
26 views
Accepted Speedup for the GPU [closed]
I have implemented a string matching algorithm on the GPU using CUDA. The minimum obtained speedup is 5x, and the maximum is 21x.
I cannot find what is the accepted speedup for using the GPU. It will ...
0
votes
1answer
29 views
OpenCL : Targeting Work-group to a specific device
Assuming that I have a multiprocessor machine. Can I bind my work-group to a specific device (processor) ?
Do we have any API to accomplish this task in openCL ?
1
vote
1answer
99 views
Handling Ctrl+C exception with GPU
I am working with some GPU programs (using CUDA 4.1 and C), and sometimes (rarely) I have to kill the program midway using Ctrl+C to handle some exception. Earlier I tried using CudaDeviceReset() ...
0
votes
1answer
34 views
Installing GPUOCELOT under OSX Lion 10.7
I'm new to stackoverflow. My question ia about gpuocelet. Is there anybody using it? Does it work on unix (I'm using a macbook air with os x 10.7) OS? I tried in many way to install it but without ...
0
votes
2answers
74 views
Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL
I was trying to understand how exactly CL_MEM_USE_HOST_PTR and CL_MEM_COPY_HOST_PTR work.
Basically when using CL_MEM_USE_HOST_PTR, say in creating a 2D image, this will copy nothing to the device, ...
0
votes
4answers
2k views
CUDA kernels consistently returning bad results
I am a CUDA beginner who has successfully compiled and run several code samples using CUDA libraries such as CUFFT and CUBLAS. Lately, however, I have been trying to generate my own simple kernels ...
0
votes
0answers
37 views
cuda strange register usage with while loops
I have a cuda kernel wrapped in a while loop, like so:
while (i < particles) {
....
}
When I change this to the following:
while (true) {
if (i >= particles) break;
....
}
...
0
votes
2answers
170 views
Efficient memcpy of large input in CUDA?
I have a problem with a program I'm writing using CUDA. I have an input array and an output array which I need to copy to device memory. The problem is that both arrays together are too large to fit ...
4
votes
4answers
166 views
GPU reads from CPU or CPU writes to the GPU?
I am beginner in parallel programming. I have a query which might be seem to be silly but I didn't get a definitive answer when I googled it out.
In GPU computing there is a device i.e. the GPU and ...
1
vote
1answer
60 views
Read/Write OpenCL memory buffers on multiple GPU in a single context
Assume a system with two distinct GPUs, but from the same vendor so they can be accessed from a single OpenCL Platform. Given the following simplified OpenCL code:
float* someRawData;
cl_device_id ...
2
votes
2answers
226 views
Why is Arrayfun much faster than a for-loop when using GPU?
Could someone tell why Arrayfun is much faster than a for loop on GPU? (not on CPU, actually a For loop is faster on CPU)
Arrayfun:
x = parallel.gpu.GPUArray(rand(512,512,64));
count = arrayfun(@(x) ...
0
votes
0answers
36 views
OpenCL Arithmetic [closed]
I have got the task to optimize the speed of an engine. Core part of this system is a Java Class which does computation on fraction part of the numbers in huge amount. Engine becomes slow when ...
2
votes
1answer
112 views
What happened when alll thread of a warp read the same global memory?
I want to know what happened when all threads of a warp read the same 32-bit address of global memory. How many memory requests are there? Is there any serialization. The GPU is Fermi card, the ...