See the tag entry for "gpu".

learn more… | top users | synonyms

-1
votes
0answers
21 views

PyCUDA “CUDA compiler succeeded, but said the following…” warning message in Linux. Is PyCUDA properly installed? [closed]

I have compiled PyCUDA in Ubuntu 12.04, and only some of the demo PyCUDA examples work fine. I can run the hello_gpu.py script,and a few wiki examples like MatrixmulSimple.py . But I get errors for ...
-2
votes
0answers
37 views

What are the basic concepts used in cublas sgemm

I am wondering which optimizations are used in cublas' sgem implementation. Let's start out with the naive implementation. One thread per matrix entry: __global__ void sgemm_naive(float alpha, float ...
0
votes
1answer
43 views

CUDA: Allocate memory for auxiliary data to the shared memory of each block efficiently

Suppose that we have an array int * data each thread will access one element of this array. Since this array will be shared among all threads it will be saved inside the global memory. Let's create ...
0
votes
1answer
41 views

OpenCL pixel check

I have a opencl kernel that does some warping on an images. This is a forward mapping and each kernel instance handles the mapping/warpping of one pixel in the source image. This means that some ...
0
votes
1answer
44 views

platforms in OpenCL

I have Nvidia Graphics card(GeForce GT 640) ON MY MOTHERBOARD. I have installed OpenCL on my box. When I query about platform using "clGetPlatformInfo(parameters)", I see the following output:- ...
1
vote
0answers
66 views

CudaMalloc not working

I'm writing a code to transfer a 3D array from Host to device, edit the array in device and transfer all of the memory back. I've cut it down to the core code shown below, but still cannot get this to ...
0
votes
1answer
44 views

OpenGL Shaders Generate Colors

I have a sprite 10x10 pixels. How can i change colors with shaders-programm in real-time. All the blue color on the sprite turned to green. All the green color on the sprite turned to white. ETC... ...
0
votes
1answer
46 views

OpenCV 2.4.4 wth CUDA: registerPageLocked fails

I am trying to page-lock a Mat that has already been created. Consider the following example code: ... Mat cpuGray; GpuMat gpuGray; cv::cvtColor (cpuColor, cpuGray, CV_BGR2GRAY); ...
2
votes
1answer
54 views

Is it possible to deallocate memory for the N last elements of a thrust::device_vector without using resize?

I'm using a device_vector in order to store information about an array of user input data. This information is necessary in order to speed things up when I call the second kernel, which runs the main ...
0
votes
0answers
34 views

Is it possible to use an array of bits or something like a std::bitset in CUDA?

I have an input array that is given to a kernel. Each thread works with one value of the array and either changes the value or doesn't change it at all according to an algorithm. I would like to find ...
3
votes
5answers
212 views

High level GPU programming in C++

I've been looking into libraries/extensions for C++ that will allow GPU-based processing on a high level. I'm not an expert in GPU programming and I don't want to dig too deep. I have a neural network ...
0
votes
0answers
30 views

Implementing Gather in Cuda device

I am trying to implement the nearest neighbors algorithm in cuda in C. I have a query set array and a document array. I find out the similarity between each query element and document. if the ...
0
votes
0answers
28 views

Include map and reduce in written in C/OpenCL in hadoop

I have written my own codes of map and reduce function in OpenCL kernel. General scenario of MapReduce which is basically incorporated in Hadoop itself is written in java. How can I use my own ...
0
votes
0answers
44 views

cublas failed to synchronize stop event?

I'm playing with the matrixMulCUBLAS sample code and tried changing the default matrix sizes to something slightly more fun rows=5k x cols=2.5k and then the example fails with the error Failed to ...
0
votes
1answer
139 views

how to measure gpu vs cpu performance , with which time measuring functions?

what libraries or functions need to be used for rectified time measurement of those in terms of objective comparison? what also be a Caveat that is need to be considered for the sake of accurate ...
1
vote
2answers
60 views

Am I right thinking that modern consumer graphics cards use exactly the same GPU structures for actual graphics rendering and bare computations?

Am I right thinking that modern consumer graphics cards (say those conventional nVidia and ATi models) use exactly the same GPU structures and operations for actual graphics rendering (through ...
0
votes
0answers
42 views

Image Reconstruction using OpenCL

I am trying to implement an algorithm using OpenCL. It can be described as : Opening or Closing Reconstruction or Deconstruction Contrast Image Thresholding I was looking for samples related to ...
-1
votes
1answer
184 views

Dual core i7-3540M 3.0 GHz vs Quad corei7-3632QM 2.2 GHz

I intend to buy a laptop to study parallel computing with GPU and multicore CPU. I don't know which is the better one between a Dual core i7-3540M 3.0 GHz and a Quad core i7-3632QM 2.2 GHz. Both of 2 ...
0
votes
1answer
100 views

CUDA and Thrust library: Trouble with using .cuh .cu and .cpp files together with -std=c++0x

I want to have a .cuh file where I can declare kernel functions and host functions as well. The implementation of these functions will be made inside the .cu file. The implementation will include the ...
1
vote
1answer
42 views

what happens when multiple kernels are sent to the device to be executed?

Suppose that I have send two consecutive kernel calls to the device. Does it wait to complete the first one or it executed them concurrently? If they are executed in parallel, do they intersect with ...
0
votes
2answers
117 views

CUDA Thrust: reduce_by_key on only some values in an array, based off values in a “key” array

Let's say I have two device_vector<byte> arrays, d_keys and d_data. If d_data is, for example, a flattened 2D 3x5 array ( e.g. { 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3 } ) and d_keys is a ...
1
vote
1answer
104 views

Use CUDA in order to compute efficiently the positions of a sorted array where an element changes

Let's say we have this sorted array 0 1 1 1 1 2 2 2 2 2 3 10 10 10 I would like to find efficiently the positions where an element changes. For example in our array the positions are the ...
2
votes
1answer
98 views

Is it possible to use CUDA in order to compute the frequency of elements inside a sorted array efficiently?

I'm very new to Cuda, I've read a few chapters from books and read a lot of tutorials online. I have made my own implementations on vector addition and multiplication. I would like to move a little ...
0
votes
1answer
95 views

Cuda Kernel with reduction - logic errors for dot product of 2 matrices

I am just starting off with CUDA and am trying to wrap my brain around CUDA reduction algorithm. In my case, I have been trying to get the dot product of two matrices. But I am getting the right ...
0
votes
0answers
117 views

CUDA appears to be extremely slow

Taking my first steps in CUDA, I tried this simple example-code which runs perfectly fine, but appears to be extremely very slow. I compiled it using nvcc version 5.0 using the commands: $ ...
0
votes
0answers
126 views

How to incorporate Hadoop File System with OpenCL/GPU Code

With reference to my previous question My doubt is how to configure HDFS with other languages. Not able to find proper tutorials to incorporate HDFS with opencl/cuda codes. I have written my own ...
0
votes
0answers
43 views

gpu driver — how do they do it? [closed]

How exactly do the open source gpu driver programmers go about their business? Assuming they don't have an x ray machine or an electron microscope. There is no JTAG on a gpu (am I right?) or other ...
0
votes
1answer
108 views

CUDA FFT exception

I'm trying to use CUDA FFT aka cufft library Problem occured when cufftPlan1d(..) throws an exception. #define NX 256 #define BATCH 10 cufftHandle plan; cufftComplex *data; ...
0
votes
2answers
102 views

How to use hadoop MapReuce framework for an Opencl application?

I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application ...
0
votes
0answers
28 views

GPU selection on sending image

I have a strange situation. I installed 2 video cards on same computer. And now I have to send images/frames through these video cards. But I don't have any ideas to select a GPU to sending my data ...
1
vote
0answers
319 views

Run OpenCL program on NVIDIA hardware

I've build a simple OpenCL based program (in C++) and tested in on Windows 8 system with AMD FirePro V4900 card. I was using AMD APP SDK. When I copy my binaries to the other machine (Windows 8 with ...
0
votes
0answers
82 views

Scattering on CUDA

I'm trying to implement the following: for (unsigned int j = 0; j < numElems; ++j) { unsigned int bin = (input[j] & mask) >> offset; output[source[bin]] = input[j]; source[bin]++; ...
1
vote
2answers
237 views

Accessing GPU via web browser

I came across this proof of concept earlier today (on TechCrunch.com) and was blown away and intrigued as to how they had managed to accomplish the end result. They state that they don't use webGL or ...
-1
votes
1answer
76 views

reading cuda data in burst mode

I currently have CUDA code that is performing around 3-4x slower than CPU code. I removed all extraneous CPU/GPU transfers so that most of the computation is being done on the GPU, and only the final ...
1
vote
1answer
59 views

Opencl parameter passing to kernel using extern

Instead of using 'setKernelArg' for passing the parameter to the kernel function, can we use extern?? for example: cl_mem countMobj; //device variable Suppose I have to pass this variable to ...
0
votes
0answers
58 views

GPU computing with Matlab on Mountain Lion

What's the status of GPU computing on Mountain Lion (10.8.2) on a retina machine? I have this configuration but Matlab (R2012b) couldn't find gpuDevice on my rMBP. I'm a bit skeptical about ...
-4
votes
1answer
78 views

How to read GigaThread global scheduler? [duplicate]

Is it possible to access the code of GigaThread global scheduler? My intention is to know how many SMs are employed by the scheduler at a given instant (upon the assumption that GigaThread global ...
-1
votes
1answer
231 views

how does nvidia-smi work?

What is the internal operation that allows nvidia-smi fetch the hardware level details? The tool executes even when some process is already running on the GPU device and gets the utilization details, ...
1
vote
1answer
256 views

F# GPU programing vs KDB for crunching data, what is the fastest?

Hi I would like to ask for anyone's experience on what is the most cost effective and efficient way of crunching huge amounts of data with either F# GPU (using a C Nivida GPU api typeprovider for ...
0
votes
1answer
84 views

error CL_OUT_OF_RESOURCES while reading back data in host memory while using atomic function in opencl kernel

I am trying to implement atomic functions in my opencl kernel. Multiple threads I am creating are parallely trying to write a single memory location. I want them to perform serial execution on that ...
0
votes
1answer
83 views

Cuda Kernel Fails to launch

Here is my code. I have an array of (x,y) pairs. I want to calculate for each co-ordinate the farthest point. #define GPUERRCHK(ans) { gpuAssert((ans), __FILE__, __LINE__); } inline void ...
1
vote
1answer
210 views

Are CUDA .ptx files portable?

I'm studying the cudaDecodeD3D9 sample to learn how CUDA works, and at compilation it generates a .ptx file from a .cu file. This .ptx file is, as I understand it so far, an intermediate ...
1
vote
1answer
134 views

CUDA not so fast against CPU with OpenMP?

I am trying to compute cross-correlation amongst 450 vectors each of size 20000. While doing this on CPU i stored the data in 2D matrix with rows=20000 and cols=450. The serial code for the ...
1
vote
0answers
55 views

Number of working GPU SMs [closed]

Is it possible to monitor the number of SMs free at a given point in time? How is gpu_sm_speed calculated? Is that the average or of individual SMs (I guess the execution time of each SM can be ...
2
votes
2answers
166 views

NVIDIA Nsight Debugging on GTX 480

I have one machine with GeForce GTX 480 but I can't debug or run analysis activity on it. This error appears when I debug or run analysis activity: The remote system is logged in through Remote ...
0
votes
1answer
181 views

GPU gives no performance improvement in Julia set computation

I am trying to compare performance in CPU and GPU. I have CPU : Intel® Core™ i5 CPU M 480 @ 2.67GHz × 4 GPU : NVidia GeForce GT 420M I can confirm that GPU is configured and works correctly with ...
0
votes
1answer
139 views

opencl local memory half threads from a group gets correct execution

I have written a kernel in opencl using local memory to get the faster execution. This is the first time I am using local memory. My global_work_size = 16 and local_work_size = 8. Opencl kernel: ...
0
votes
0answers
16 views

should I build my project on the nsight host machine or target machine and then debug it?

I configure remote debugging . my host have a Geforce 6200 TurboCache and my target have Geforce 8600 GT GPU . should I built my project on the host machine ? If yes ,I cant build my cuda project on ...
0
votes
2answers
145 views

Nsight skips (ignores) over break points in VS10 Cuda when debugging remotely but if debug locally on the target machine it works fine

when I debugging my cuda project remotely on the host it ignore breakpoints but execute completely . but when I debug my project locally on the target machine it works fine . I checked my driver ...
0
votes
1answer
80 views

How to allocate all of the available shared memory to a single block in CUDA?

I want to allocate all the available shared memory of an SM to one block. I am doing this because I don't want multiple blocks to be assigned to the same SM. My GPU card has 64KB (Shared+L1) memory. ...

1 2 3 4 5 6
15 30 50 per page