Newest 'gpu-programming gpu' Questions

0

votes

0answers

7 views

Is there any good tutoria or reference for writing code with Magma?

Currently I am trying to use Magma to do matrix operation on GPU, however, I found few documents about it. The only thing I can refer to is its testing program and the online generated document(here), ...

gpu gpu-programming

asked Jul 10 at 14:30

itsuper7
11819

0

votes

1answer

24 views

how does Multithreading in GPUs work?

How does a GPU handle multithreading ?? In CPUs for example there will be independent copies of the Register File for each thread. But with large register files as in GPUs that will be impossible. So ...

asked Jun 29 at 14:00

Mohammad Ewais
534

0

votes

0answers

5 views

Running AMD GPU Assembly

I am trying to run AMD GPU Assembly on my PC. I am using Ubuntu 12.04 64-bit and Windows 7 Ultimate. I am using 6XX GPU. Please tell me how to run it. A good resource links is also helpful. If you can ...

gpu amd gpu-programming isa

asked Jun 24 at 9:56

Fr34K
142111

2

votes

1answer

41 views

CUDA: How does Thrust manage memory when using a Comparator in a sorting function?

I have a char array of 10 characters that I would like to pass as an argument to a comparator which will be used by Thrust's sorting function. In order to allocate memory for this array I use ...

sorting cuda gpu thrust gpu-programming

asked Jun 23 at 11:16

ksm001
656129

2

votes

1answer

59 views

Is it possible to deallocate memory for the N last elements of a thrust::device_vector without using resize?

I'm using a device_vector in order to store information about an array of user input data. This information is necessary in order to speed things up when I call the second kernel, which runs the main ...

cuda gpu thrust gpu-programming

asked May 21 at 13:52

ksm001
656129

3

votes

5answers

259 views

High level GPU programming in C++

I've been looking into libraries/extensions for C++ that will allow GPU-based processing on a high level. I'm not an expert in GPU programming and I don't want to dig too deep. I have a neural network ...

c++ cuda gpu gpu-programming

asked May 8 at 10:17

goocreations
319414

1

vote

2answers

69 views

Am I right thinking that modern consumer graphics cards use exactly the same GPU structures for actual graphics rendering and bare computations?

Am I right thinking that modern consumer graphics cards (say those conventional nVidia and ATi models) use exactly the same GPU structures and operations for actual graphics rendering (through ...

driver hardware gpu gpu-programming firmware

asked Apr 26 at 20:09

Ivan
8,601544136

0

votes

1answer

103 views

Cuda Kernel with reduction - logic errors for dot product of 2 matrices

I am just starting off with CUDA and am trying to wrap my brain around CUDA reduction algorithm. In my case, I have been trying to get the dot product of two matrices. But I am getting the right ...

cuda gpu gpu-programming reduction

asked Mar 30 at 2:08

Bhrugesh Patel
617729

0

votes

0answers

139 views

CUDA appears to be extremely slow

Taking my first steps in CUDA, I tried this simple example-code which runs perfectly fine, but appears to be extremely very slow. I compiled it using nvcc version 5.0 using the commands: $ ...

cuda gpu gpgpu gpu-programming

asked Mar 27 at 20:44

Pantelis Sopasakis
513513

0

votes

1answer

140 views

CUDA FFT exception

I'm trying to use CUDA FFT aka cufft library Problem occured when cufftPlan1d(..) throws an exception. #define NX 256 #define BATCH 10 cufftHandle plan; cufftComplex *data; ...

c++ cuda gpu gpgpu gpu-programming

asked Mar 21 at 10:52

TripleS
303413

0

votes

0answers

32 views

GPU selection on sending image

I have a strange situation. I installed 2 video cards on same computer. And now I have to send images/frames through these video cards. But I don't have any ideas to select a GPU to sending my data ...

c# gpu gpu-programming

asked Mar 17 at 8:54

mst
1

0

votes

0answers

85 views

Scattering on CUDA

I'm trying to implement the following: for (unsigned int j = 0; j < numElems; ++j) { unsigned int bin = (input[j] & mask) >> offset; output[source[bin]] = input[j]; source[bin]++; ...

design-patterns cuda gpu gpu-programming scatter

asked Mar 5 at 13:43

facunvd
615114

1

vote

2answers

350 views

Accessing GPU via web browser

I came across this proof of concept earlier today (on TechCrunch.com) and was blown away and intrigued as to how they had managed to accomplish the end result. They state that they don't use webGL or ...

javascript 3d gpu gpu-programming

asked Mar 4 at 23:36

Giles Thompson
1447

-1

votes

1answer

85 views

reading cuda data in burst mode

I currently have CUDA code that is performing around 3-4x slower than CPU code. I removed all extraneous CPU/GPU transfers so that most of the computation is being done on the GPU, and only the final ...

c++ c cuda gpu gpu-programming

asked Feb 27 at 22:06

assassin
1,02941222

0

votes

1answer

96 views

error CL_OUT_OF_RESOURCES while reading back data in host memory while using atomic function in opencl kernel

I am trying to implement atomic functions in my opencl kernel. Multiple threads I am creating are parallely trying to write a single memory location. I want them to perform serial execution on that ...

opencl gpu gpgpu gpu-programming

asked Feb 5 at 5:59

sandeep.ganage
12612

1

vote

1answer

145 views

CUDA not so fast against CPU with OpenMP?

I am trying to compute cross-correlation amongst 450 vectors each of size 20000. While doing this on CPU i stored the data in 2D matrix with rows=20000 and cols=450. The serial code for the ...

cuda gpu gpgpu gpu-programming

asked Jan 31 at 19:41

mkuse
37913

1

vote

0answers

61 views

Number of working GPU SMs [closed]

Is it possible to monitor the number of SMs free at a given point in time? How is gpu_sm_speed calculated? Is that the average or of individual SMs (I guess the execution time of each SM can be ...

cuda gpu gpgpu nvidia gpu-programming

asked Jan 31 at 5:26

user1550304
94

0

votes

1answer

204 views

GPU gives no performance improvement in Julia set computation

I am trying to compare performance in CPU and GPU. I have CPU : Intel® Core™ i5 CPU M 480 @ 2.67GHz × 4 GPU : NVidia GeForce GT 420M I can confirm that GPU is configured and works correctly with ...

cuda gpu gpgpu gpu-programming

asked Jan 30 at 11:28

mkuse
37913

0

votes

1answer

173 views

opencl local memory half threads from a group gets correct execution

I have written a kernel in opencl using local memory to get the faster execution. This is the first time I am using local memory. My global_work_size = 16 and local_work_size = 8. Opencl kernel: ...

opencl gpu gpgpu gpu-programming

asked Jan 28 at 13:53

sandeep.ganage
12612

0

votes

1answer

86 views

How to allocate all of the available shared memory to a single block in CUDA?

I want to allocate all the available shared memory of an SM to one block. I am doing this because I don't want multiple blocks to be assigned to the same SM. My GPU card has 64KB (Shared+L1) memory. ...

memory cuda gpu gpu-programming

asked Jan 22 at 19:30

Iman
83

0

votes

1answer

138 views

Race condition in opencl kernel threads

If multiple threads are simultaneously writing a single memory location.,there will be a race condition,right?? In my case same is happening.. Consider a module from 'reduce.cl' int i = ...

opencl gpu gpgpu gpu-programming

asked Jan 21 at 7:58

sandeep.ganage
12612

0

votes

0answers

203 views

Per vertex mesh deformation

I am doing a project where i want to have i vertex buffer (in opengl) where I have vertices that make out a mesh of an image. Meaning that each pixel of the image consists of two triangles (a square ...

opengl graphics computer-vision gpu gpu-programming

asked Dec 16 '12 at 20:35

user1823350
406

0

votes

2answers

756 views

disable Force GPU rendering programming

I want disable Force GPU rendering in my android program . now i have to go setting on device and disable it , but it is hard for my user.

android android-layout gpu gpu-programming

asked Dec 3 '12 at 8:42

nargess reyahi
174

1

vote

1answer

63 views

What should I consider when choosing a Video Card for GPGPU [closed]

What are the key things to consider when looking for a video card to be used with C++ AMP? I can't afford a high end compute dedicated GPU or workstation GPU so I'm looking at cards in the sub $600 ...

gpu gpgpu gpu-programming

asked Nov 29 '12 at 20:32

Matthew Crews
569315

0

votes

1answer

175 views

Which is better ? Loop inside kernel or Looping kernels for CUDA GPU

Device GeForce GTX 680 In the program, i have very long array to be processed inside kernel.(Approx 1 GB of integers).As per need,My array is divided into blocks sequentially with some ...

cuda gpu gpu-programming

asked Nov 25 '12 at 17:01

user1352179
206

1

vote

1answer

74 views

For different runs, Previous Values are retained in global memory for kernel arguments for CUDA GPU

Device GeForce GTX 680 In my program,value is copied from host to device variable using CUDA Memcpy. I could see that previous values are retained in global memory on different executions of ...

cuda gpu gpu-programming

asked Nov 23 '12 at 21:03

user1352179
206

1

vote

1answer

138 views

Optimizing Cuda kernel regarding normalisation of array

I'm trying to normalise the array as follows. Pick the first two elements of the array, find the sum and divide them using that sum. Do the same for rest of the elements. It works fine. But when ...

cuda gpu gpgpu gpu-programming pycuda

asked Nov 14 '12 at 10:37

Muthu
283

3

votes

2answers

2k views

How to configure OpenCL in visual studio2010 for nvidia's gpu on windows?

I am using NVIDIA's GeForce GTX 480 GPU on windows & operating system. I have already configured Visual Studio 2010 for CUDA 4.2. How to configure OpenCL for nvidia's gpu on visual studio 2010?? ...

cuda opencl gpu gpgpu gpu-programming

asked Oct 16 '12 at 15:35

sandeep.ganage
12612

2

votes

1answer

334 views

How many OpenCL registers has ATI Radeon HD 6750M and 6970M?

I cannot find any information about number of registers in the ATI Radeon HD 6750M and 6970M GPUs. I want to optimize my OpenCL kernels to utilize as many as possimbe processing units, so I need to ...

opencl gpu gpu-programming

asked Sep 30 '12 at 10:06

Pavel
184

-1

votes

1answer

79 views

Bads results with gpu program [closed]

I haven't got good results with an iterative equation solving. I am using a 2D array with "size_y" rows with "size_x" elements for each row. The problem is that the code only does one iteration ...

opencl gpu gpu-programming

asked Sep 13 '12 at 2:19

user1610662
112

0

votes

2answers

223 views

Optimization tips for a cuda code

I wrote a piece of code for computing Self Quotient Image (SQI) in MATLAB. And now i want to rewrite a part of it in parallel for speedup. this part of code is: siz=15; X=normalize8(X); ...

c++ matlab cuda gpu gpu-programming

asked Sep 6 '12 at 7:46

Mbt925
404113

0

votes

2answers

228 views

Cannot read out Values from Texture Memory

Hi I'm writing a simple Program for practicing to work with texture memory. I Just want to write my data into Texture Memory and write it back into Global Memory. But i cannont read out the Values. ...

cuda texture gpu textures gpu-programming

asked Sep 4 '12 at 13:17

Silve2611
877

3

votes

1answer

214 views

How does the speed of CUDA program scale with the number of blocks?

I am working on Tesla C1060, which contains 240 processor cores with compute capability 1.3. Knowing that each 8 cores are controlled by a single multi-processor, and that each block of threads is ...

cuda gpu gpgpu gpu-programming

asked Aug 29 '12 at 14:43

Tarek
30829

0

votes

2answers

734 views

CUDA kernel doesn't launch

My problem is very much like this one. I run the simplest CUDA program but the kernel doesn't launch. However, I am sure that my CUDA installation is ok, since I can run complicated CUDA projects ...

cuda gpu gpgpu gpu-programming

asked Aug 28 '12 at 17:12

Tarek
30829

0

votes

1answer

154 views

cuda invalid configuration error 9

I have a Cuda application; after first allocating cuda memory for various arrays the program loops through: transfer data to GPU, Process kernels on GPU, transfer data back from GPU. The first data ...

gpu gpgpu nvidia gpu-programming

asked Aug 27 '12 at 22:33

JPM
956

0

votes

1answer

156 views

Saving Values after Calculating with Texture Memory

Hi I have a simple Calculation Using Texture Memory. But i am not able to save the right results. The result should be a interpolation. For example angle = 0.5 A[0] = 1, B[0] = 2, result[0] should be ...

cuda texture gpu textures gpu-programming

asked Aug 27 '12 at 12:29

Silve2611
877

3

votes

4answers

254 views

Accuracy of GPU for scientific computing

An electrical engineer recently cautioned me against using GPUs for scientific computing (e.g. where accuracy really matters) on the basis that there are no hardware safeguards like there are in a ...

gpu gpu-programming

asked Aug 24 '12 at 14:28

Ari B. Friedman
18.6k35187

0

votes

2answers

724 views

Interpolation with CUDA Texture memory

I would like to use the texture Memory for Interpolation of Data. I have 2 Arrays and I would want to interpolate Data between them (between A[i] and B[i]). Now I thought I could bind them to texture ...

cuda gpu textures gpu-programming cuda.net

asked Aug 22 '12 at 8:49

Silve2611
877

4

votes

4answers

553 views

Which Java code can be moved to the GPU?

With the framework rootbeer is GPU programming for Java possible. Which Java code should be used for rootbeer and which code should better run in the Java VM self? Or other: which code produce ...

java gpu gpu-programming rootbeer

asked Aug 12 '12 at 18:11

Horcrux7
6,70793559

3

votes

1answer

422 views

Linking with 3rd party CUDA libraries slows down cudaMalloc

It is not a secret that on CUDA 4.x the first call to cudaMalloc can be ridiculously slow (which was reported several times), seemingly a bug in CUDA drivers. Recently, I noticed weird behaviour: the ...

cuda gpu gpgpu gpu-programming

asked Jul 26 '12 at 7:44

asm
93710

0

votes

2answers

189 views

Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL

I was trying to understand how exactly CL_MEM_USE_HOST_PTR and CL_MEM_COPY_HOST_PTR work. Basically when using CL_MEM_USE_HOST_PTR, say in creating a 2D image, this will copy nothing to the device, ...

opencl gpu gpu-programming

asked Jul 8 '12 at 5:34

Nike
1007

5

votes

4answers

368 views

GPU reads from CPU or CPU writes to the GPU?

I am beginner in parallel programming. I have a query which might be seem to be silly but I didn't get a definitive answer when I googled it out. In GPU computing there is a device i.e. the GPU and ...

cuda opencl gpu gpu-programming

asked Jul 2 '12 at 19:14

Nike
1007

2

votes

1answer

180 views

About compact operation in cuddpp

The following kernel function is the compact operation in the cudpp, a cuda library (http://gpgpu.org/developer/cudpp). My question is why the developer repeats the the writing part 8 times? And why ...

cuda gpu gpgpu gpu-programming

asked Jun 1 '12 at 17:48

Fan Zhang
24418

2

votes

4answers

1k views

Nsight skips (ignores) over break points in VS10 Cuda works fine, nsight consistently skips over several breakpoints

I'm using nsight 2.2 , Toolkit 4.2 , latest nvidia driver , I'm using couple gpu's in my computer. Build customize 4.2. I have set "generate GPU ouput" on CUDA's project properties, nsight monitor is ...

cuda gpu gpgpu gpu-programming nsight

asked May 31 '12 at 5:39

TripleS
303413

3

votes

1answer

373 views

What happened when alll thread of a warp read the same global memory?

I want to know what happened when all threads of a warp read the same 32-bit address of global memory. How many memory requests are there? Is there any serialization. The GPU is Fermi card, the ...

cuda gpu gpgpu gpu-programming

asked May 24 '12 at 7:42

Fan Zhang
24418

0

votes

0answers

188 views

Calling cutilExit(argc, argv) cause error

in the end of dll ( slightly modified example ) is called cutilExit(argc, argv) and cause: Error when parsing command line argument string. Have no idea what is the problem and not sure which ...

cuda gpu gpu-programming

asked May 22 '12 at 17:30

user1281071
16119

2

votes

1answer

93 views

Is there a way to limit or prioritize how much processing power an OpenCL application can use?

First, I'm not even an OpenCL newbie-- I know what it is but I haven't so much as written one line of code. However, I have looked through some OpenCL on a very simple, open-source project and ...

osx opencl gpu gpu-programming

asked May 17 '12 at 12:40

RLH
3,02711659

0

votes

1answer

398 views

Using OpenCV with GPU that is not factory built-in? [closed]

I want to speed up my OpenCV based software for real-time operation using the OpenCV's GPU support library. My computer does not have an in-built GPU supported by OpenCV, so here goes my questions: ...

c++ opencv cuda gpu gpu-programming

asked May 11 '12 at 13:40

Dyps
235

0

votes

2answers

356 views

How to reduce the branch divergence of binary search using CUDA

The application is to intersect two sorted list of integers (set intersection), say list1 and list2. Each element of list1 will be assigned a GPU thread, and do binary search to check whether it ...

algorithm cuda gpu gpgpu gpu-programming

asked Apr 30 '12 at 19:19

Fan Zhang
24418

2

votes

1answer

3k views

what is difference between “-arch sm_13” and “-arch sm_20”

I need double precision calculation in my application. According what I found on google I should add a flag "-arch sm_13" or "-arch sm_20". Q1: What is the difference between "-arch sm_13" and "-arch ...

cuda gpu gpu-programming

asked Apr 26 '12 at 9:18

user1281071
16119

Tagged Questions

Related Tags