Newest 'gpu-programming' Questions

-1

votes

0answers

21 views

PyCUDA “CUDA compiler succeeded, but said the following…” warning message in Linux. Is PyCUDA properly installed? [closed]

I have compiled PyCUDA in Ubuntu 12.04, and only some of the demo PyCUDA examples work fine. I can run the hello_gpu.py script,and a few wiki examples like MatrixmulSimple.py . But I get errors for ...

asked Jun 4 at 15:08

arvind
165

-2

votes

0answers

37 views

What are the basic concepts used in cublas sgemm

I am wondering which optimizations are used in cublas' sgem implementation. Let's start out with the naive implementation. One thread per matrix entry: __global__ void sgemm_naive(float alpha, float ...

algorithm cuda gpgpu matrix-multiplication gpu-programming

asked May 31 at 22:37

niklasfi
1,19321422

0

votes

1answer

43 views

CUDA: Allocate memory for auxiliary data to the shared memory of each block efficiently

Suppose that we have an array int * data each thread will access one element of this array. Since this array will be shared among all threads it will be saved inside the global memory. Let's create ...

cuda shared-memory gpu-programming

asked May 25 at 23:36

ksm001
62929

0

votes

1answer

41 views

OpenCL pixel check

I have a opencl kernel that does some warping on an images. This is a forward mapping and each kernel instance handles the mapping/warpping of one pixel in the source image. This means that some ...

computer-vision opencl gpgpu gpu-programming

asked May 24 at 13:08

user1823350
285

0

votes

1answer

44 views

platforms in OpenCL

I have Nvidia Graphics card(GeForce GT 640) ON MY MOTHERBOARD. I have installed OpenCL on my box. When I query about platform using "clGetPlatformInfo(parameters)", I see the following output:- ...

opencl gpgpu gpu-programming

asked May 23 at 9:12

Rohit Sarewar
163

1

vote

0answers

66 views

CudaMalloc not working

I'm writing a code to transfer a 3D array from Host to device, edit the array in device and transfer all of the memory back. I've cut it down to the core code shown below, but still cannot get this to ...

cuda gpgpu gpu-programming

asked May 22 at 19:39

john smith
62

0

votes

1answer

44 views

OpenGL Shaders Generate Colors

I have a sprite 10x10 pixels. How can i change colors with shaders-programm in real-time. All the blue color on the sprite turned to green. All the green color on the sprite turned to white. ETC... ...

opengl shader gpu-programming fragment-shader vertex-shader

asked May 22 at 7:26

Dino Balloons
42

0

votes

1answer

46 views

OpenCV 2.4.4 wth CUDA: registerPageLocked fails

I am trying to page-lock a Mat that has already been created. Consider the following example code: ... Mat cpuGray; GpuMat gpuGray; cv::cvtColor (cpuColor, cpuGray, CV_BGR2GRAY); ...

opencv cuda gpu-programming

asked May 21 at 21:29

user2407197
1

2

votes

1answer

54 views

Is it possible to deallocate memory for the N last elements of a thrust::device_vector without using resize?

I'm using a device_vector in order to store information about an array of user input data. This information is necessary in order to speed things up when I call the second kernel, which runs the main ...

cuda gpu thrust gpu-programming

asked May 21 at 13:52

ksm001
62929

0

votes

0answers

34 views

Is it possible to use an array of bits or something like a std::bitset in CUDA?

I have an input array that is given to a kernel. Each thread works with one value of the array and either changes the value or doesn't change it at all according to an algorithm. I would like to find ...

cuda bit gpu-programming bitset

asked May 19 at 11:38

ksm001
62929

3

votes

5answers

212 views

High level GPU programming in C++

I've been looking into libraries/extensions for C++ that will allow GPU-based processing on a high level. I'm not an expert in GPU programming and I don't want to dig too deep. I have a neural network ...

c++ cuda gpu gpu-programming

asked May 8 at 10:17

goocreations
320414

0

votes

0answers

30 views

Implementing Gather in Cuda device

I am trying to implement the nearest neighbors algorithm in cuda in C. I have a query set array and a document array. I find out the similarity between each query element and document. if the ...

cuda parallel-processing gpu-programming

asked May 4 at 18:25

Lopez
449

0

votes

0answers

28 views

Include map and reduce in written in C/OpenCL in hadoop

I have written my own codes of map and reduce function in OpenCL kernel. General scenario of MapReduce which is basically incorporated in Hadoop itself is written in java. How can I use my own ...

mapreduce opencl gpgpu gpu-programming hadoop-streaming

asked May 2 at 8:50

sandeep.ganage
11612

0

votes

0answers

44 views

cublas failed to synchronize stop event?

I'm playing with the matrixMulCUBLAS sample code and tried changing the default matrix sizes to something slightly more fun rows=5k x cols=2.5k and then the example fails with the error Failed to ...

cuda gpu-programming cublas

asked May 1 at 11:42

Giovanni Azua
1,392210

0

votes

1answer

139 views

how to measure gpu vs cpu performance , with which time measuring functions?

what libraries or functions need to be used for rectified time measurement of those in terms of objective comparison? what also be a Caveat that is need to be considered for the sake of accurate ...

time cuda gpu-programming measurement

asked Apr 27 at 23:58

Erogol
910923

1

vote

2answers

60 views

Am I right thinking that modern consumer graphics cards use exactly the same GPU structures for actual graphics rendering and bare computations?

Am I right thinking that modern consumer graphics cards (say those conventional nVidia and ATi models) use exactly the same GPU structures and operations for actual graphics rendering (through ...

driver hardware gpu gpu-programming firmware

asked Apr 26 at 20:09

Ivan
8,204441131

0

votes

0answers

42 views

Image Reconstruction using OpenCL

I am trying to implement an algorithm using OpenCL. It can be described as : Opening or Closing Reconstruction or Deconstruction Contrast Image Thresholding I was looking for samples related to ...

image-processing opencl gpu-programming morphological-analysis

asked Apr 25 at 11:48

baptiste
11

-1

votes

1answer

184 views

Dual core i7-3540M 3.0 GHz vs Quad corei7-3632QM 2.2 GHz

I intend to buy a laptop to study parallel computing with GPU and multicore CPU. I don't know which is the better one between a Dual core i7-3540M 3.0 GHz and a Quad core i7-3632QM 2.2 GHz. Both of 2 ...

parallel-processing multicore gpu-programming

asked Apr 22 at 17:32

Hiếu Mai Trung
11

0

votes

1answer

100 views

CUDA and Thrust library: Trouble with using .cuh .cu and .cpp files together with -std=c++0x

I want to have a .cuh file where I can declare kernel functions and host functions as well. The implementation of these functions will be made inside the .cu file. The implementation will include the ...

c++11 cuda parallel-processing thrust gpu-programming

asked Apr 22 at 14:22

ksm001
62929

1

vote

1answer

42 views

what happens when multiple kernels are sent to the device to be executed?

Suppose that I have send two consecutive kernel calls to the device. Does it wait to complete the first one or it executed them concurrently? If they are executed in parallel, do they intersect with ...

cuda gpu-programming

asked Apr 20 at 10:55

Erogol
910923

0

votes

2answers

117 views

CUDA Thrust: reduce_by_key on only some values in an array, based off values in a “key” array

Let's say I have two device_vector<byte> arrays, d_keys and d_data. If d_data is, for example, a flattened 2D 3x5 array ( e.g. { 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3 } ) and d_keys is a ...

cuda gpu-programming thrust reduction

asked Apr 13 at 15:03

JohnDoe
133

1

vote

1answer

104 views

Use CUDA in order to compute efficiently the positions of a sorted array where an element changes

Let's say we have this sorted array 0 1 1 1 1 2 2 2 2 2 3 10 10 10 I would like to find efficiently the positions where an element changes. For example in our array the positions are the ...

c++ cuda parallel-processing gpu-programming

asked Apr 11 at 1:45

ksm001
62929

2

votes

1answer

98 views

Is it possible to use CUDA in order to compute the frequency of elements inside a sorted array efficiently?

I'm very new to Cuda, I've read a few chapters from books and read a lot of tutorials online. I have made my own implementations on vector addition and multiplication. I would like to move a little ...

c++ cuda frequency gpu-programming sorted

asked Apr 9 at 23:52

ksm001
62929

0

votes

1answer

95 views

Cuda Kernel with reduction - logic errors for dot product of 2 matrices

I am just starting off with CUDA and am trying to wrap my brain around CUDA reduction algorithm. In my case, I have been trying to get the dot product of two matrices. But I am getting the right ...

cuda gpu gpu-programming reduction

asked Mar 30 at 2:08

Bhrugesh Patel
589628

0

votes

0answers

117 views

CUDA appears to be extremely slow

Taking my first steps in CUDA, I tried this simple example-code which runs perfectly fine, but appears to be extremely very slow. I compiled it using nvcc version 5.0 using the commands: $ ...

cuda gpu gpgpu gpu-programming

asked Mar 27 at 20:44

Pantelis Sopasakis
508413

0

votes

0answers

126 views

How to incorporate Hadoop File System with OpenCL/GPU Code

With reference to my previous question My doubt is how to configure HDFS with other languages. Not able to find proper tutorials to incorporate HDFS with opencl/cuda codes. I have written my own ...

hadoop mapreduce opencl gpgpu gpu-programming

asked Mar 23 at 15:13

sandeep.ganage
11612

0

votes

0answers

43 views

gpu driver — how do they do it? [closed]

How exactly do the open source gpu driver programmers go about their business? Assuming they don't have an x ray machine or an electron microscope. There is no JTAG on a gpu (am I right?) or other ...

logic driver reverse-engineering gpu-programming machine-language

asked Mar 23 at 3:29

user108754
527

0

votes

1answer

108 views

CUDA FFT exception

I'm trying to use CUDA FFT aka cufft library Problem occured when cufftPlan1d(..) throws an exception. #define NX 256 #define BATCH 10 cufftHandle plan; cufftComplex *data; ...

c++ cuda gpu gpgpu gpu-programming

asked Mar 21 at 10:52

TripleS
190310

0

votes

2answers

102 views

How to use hadoop MapReuce framework for an Opencl application?

I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application ...

hadoop mapreduce opencl gpu-programming hadoop-partitioning

asked Mar 19 at 9:30

sandeep.ganage
11612

0

votes

0answers

28 views

GPU selection on sending image

I have a strange situation. I installed 2 video cards on same computer. And now I have to send images/frames through these video cards. But I don't have any ideas to select a GPU to sending my data ...

c# gpu gpu-programming

asked Mar 17 at 8:54

mst
1

1

vote

0answers

319 views

Run OpenCL program on NVIDIA hardware

I've build a simple OpenCL based program (in C++) and tested in on Windows 8 system with AMD FirePro V4900 card. I was using AMD APP SDK. When I copy my binaries to the other machine (Windows 8 with ...

opencl gpgpu gpu-programming

asked Mar 8 at 23:40

Alexey
61

0

votes

0answers

82 views

Scattering on CUDA

I'm trying to implement the following: for (unsigned int j = 0; j < numElems; ++j) { unsigned int bin = (input[j] & mask) >> offset; output[source[bin]] = input[j]; source[bin]++; ...

design-patterns cuda gpu gpu-programming scatter

asked Mar 5 at 13:43

facunvd
615113

1

vote

2answers

237 views

Accessing GPU via web browser

I came across this proof of concept earlier today (on TechCrunch.com) and was blown away and intrigued as to how they had managed to accomplish the end result. They state that they don't use webGL or ...

javascript 3d gpu gpu-programming

asked Mar 4 at 23:36

Giles Thompson
1447

-1

votes

1answer

76 views

reading cuda data in burst mode

I currently have CUDA code that is performing around 3-4x slower than CPU code. I removed all extraneous CPU/GPU transfers so that most of the computation is being done on the GPU, and only the final ...

c++ c cuda gpu gpu-programming

asked Feb 27 at 22:06

assassin
93941222

1

vote

1answer

59 views

Opencl parameter passing to kernel using extern

Instead of using 'setKernelArg' for passing the parameter to the kernel function, can we use extern?? for example: cl_mem countMobj; //device variable Suppose I have to pass this variable to ...

opencl gpgpu gpu-programming

asked Feb 27 at 11:46

sandeep.ganage
11612

0

votes

0answers

58 views

GPU computing with Matlab on Mountain Lion

What's the status of GPU computing on Mountain Lion (10.8.2) on a retina machine? I have this configuration but Matlab (R2012b) couldn't find gpuDevice on my rMBP. I'm a bit skeptical about ...

osx-mountain-lion matlab-toolbox gpu-programming

asked Feb 21 at 9:29

user2047050
62

-4

votes

1answer

78 views

How to read GigaThread global scheduler? [duplicate]

Is it possible to access the code of GigaThread global scheduler? My intention is to know how many SMs are employed by the scheduler at a given instant (upon the assumption that GigaThread global ...

cuda gpgpu gpu-programming

asked Feb 16 at 9:08

user1550304
44

-1

votes

1answer

231 views

how does nvidia-smi work?

What is the internal operation that allows nvidia-smi fetch the hardware level details? The tool executes even when some process is already running on the GPU device and gets the utilization details, ...

cuda gpgpu nvidia gpu-programming

asked Feb 16 at 3:40

user1550304
44

1

vote

1answer

256 views

F# GPU programing vs KDB for crunching data, what is the fastest?

Hi I would like to ask for anyone's experience on what is the most cost effective and efficient way of crunching huge amounts of data with either F# GPU (using a C Nivida GPU api typeprovider for ...

f# gpu-programming kdb

asked Feb 7 at 17:33

Nikos
467113

0

votes

1answer

84 views

error CL_OUT_OF_RESOURCES while reading back data in host memory while using atomic function in opencl kernel

I am trying to implement atomic functions in my opencl kernel. Multiple threads I am creating are parallely trying to write a single memory location. I want them to perform serial execution on that ...

opencl gpu gpgpu gpu-programming

asked Feb 5 at 5:59

sandeep.ganage
11612

0

votes

1answer

83 views

Cuda Kernel Fails to launch

Here is my code. I have an array of (x,y) pairs. I want to calculate for each co-ordinate the farthest point. #define GPUERRCHK(ans) { gpuAssert((ans), __FILE__, __LINE__); } inline void ...

cuda gpgpu gpu-programming

asked Feb 2 at 14:14

mkuse
35913

1

vote

1answer

210 views

Are CUDA .ptx files portable?

I'm studying the cudaDecodeD3D9 sample to learn how CUDA works, and at compilation it generates a .ptx file from a .cu file. This .ptx file is, as I understand it so far, an intermediate ...

c++ cuda gpgpu gpu-programming

asked Feb 1 at 18:48

Asik
4,7201835

1

vote

1answer

134 views

CUDA not so fast against CPU with OpenMP?

I am trying to compute cross-correlation amongst 450 vectors each of size 20000. While doing this on CPU i stored the data in 2D matrix with rows=20000 and cols=450. The serial code for the ...

cuda gpu gpgpu gpu-programming

asked Jan 31 at 19:41

mkuse
35913

1

vote

0answers

55 views

Number of working GPU SMs [closed]

Is it possible to monitor the number of SMs free at a given point in time? How is gpu_sm_speed calculated? Is that the average or of individual SMs (I guess the execution time of each SM can be ...

cuda gpu gpgpu nvidia gpu-programming

asked Jan 31 at 5:26

user1550304
44

2

votes

2answers

166 views

NVIDIA Nsight Debugging on GTX 480

I have one machine with GeForce GTX 480 but I can't debug or run analysis activity on it. This error appears when I debug or run analysis activity: The remote system is logged in through Remote ...

cuda gpu-programming nsight

asked Jan 30 at 18:44

farzad
245

0

votes

1answer

181 views

GPU gives no performance improvement in Julia set computation

I am trying to compare performance in CPU and GPU. I have CPU : Intel® Core™ i5 CPU M 480 @ 2.67GHz × 4 GPU : NVidia GeForce GT 420M I can confirm that GPU is configured and works correctly with ...

cuda gpu gpgpu gpu-programming

asked Jan 30 at 11:28

mkuse
35913

0

votes

1answer

139 views

opencl local memory half threads from a group gets correct execution

I have written a kernel in opencl using local memory to get the faster execution. This is the first time I am using local memory. My global_work_size = 16 and local_work_size = 8. Opencl kernel: ...

opencl gpu gpgpu gpu-programming

asked Jan 28 at 13:53

sandeep.ganage
11612

0

votes

0answers

16 views

should I build my project on the nsight host machine or target machine and then debug it?

I configure remote debugging . my host have a Geforce 6200 TurboCache and my target have Geforce 8600 GT GPU . should I built my project on the host machine ? If yes ,I cant build my cuda project on ...

nvidia gpu-programming nsight

asked Jan 27 at 12:41

farzad
245

0

votes

2answers

145 views

Nsight skips (ignores) over break points in VS10 Cuda when debugging remotely but if debug locally on the target machine it works fine

when I debugging my cuda project remotely on the host it ignore breakpoints but execute completely . but when I debug my project locally on the target machine it works fine . I checked my driver ...

cuda gpu-programming nsight cuda-gdb

asked Jan 25 at 15:48

farzad
245

0

votes

1answer

80 views

How to allocate all of the available shared memory to a single block in CUDA?

I want to allocate all the available shared memory of an SM to one block. I am doing this because I don't want multiple blocks to be assigned to the same SM. My GPU card has 64KB (Shared+L1) memory. ...

memory cuda gpu gpu-programming

asked Jan 22 at 19:30

Iman
83

Tagged Questions

Related Tags