See the tag entry for "gpu".
0
votes
0answers
2 views
How can I run and test NVENC API working on Linux CentOS?
We have a server with kepler graphics card and Nvidia driver already installed.
How can I run NVENC ( Hardware for Video encoding ) and use its SDK on linux CentOS 6.4
How can I test it that is it ...
0
votes
0answers
7 views
Is there any good tutoria or reference for writing code with Magma?
Currently I am trying to use Magma to do matrix operation on GPU, however, I found few documents about it. The only thing I can refer to is its testing program and the online generated document(here), ...
0
votes
1answer
90 views
How to solve “expected an identifier” error in CUDA
I'm having a problem with a kernel in CUDA C programming when compiling line 5. I got an "expected an identifier" error. Why is this happening?
My kernel function is the following:
__global__ void ...
0
votes
1answer
24 views
how does Multithreading in GPUs work?
How does a GPU handle multithreading ??
In CPUs for example there will be independent copies of the Register File for each thread. But with large register files as in GPUs that will be impossible. So ...
1
vote
1answer
24 views
How to set the right alignment for an OpenCL array of structs?
I have the following structure:
C++:
struct ss{
cl_float3 pos;
cl_float value;
cl_bool moved;
cl_bool nextMoved;
cl_int movePriority;
cl_int nextMovePriority;
cl_float ...
0
votes
1answer
38 views
Find the closest weight vector to each instance in the data matrix
Suppose I have a weight matrix W nxm where m is the number of variables and the n is the number of instances. Also I have data matrix X of the same size. I try to find the closest weight vector to ...
0
votes
0answers
13 views
Parallax Occlusion Mapping with Silhouettes
Does anyone knows how to implement the correct silhouettes effect in this youtube video?
Actually, I understand (and successfully implemented) the parallax occlusion mapping algorithm, but I have no ...
0
votes
1answer
35 views
Using shaders for long computations without causing lag
I am trying to use the Compute Shader with DirectX 11 to do some simple, but expensive calculations (think Mandelbrot Set). The result of the calculation is placed on a texture and are ...
0
votes
1answer
53 views
CUDA: When can someone achieve coalescing memory?
I have trouble understanding this concept. I've researched a lot online and the only thing I understood is that threads need to access consecutive data.
So if we have an array of 10000 integers, if ...
0
votes
0answers
5 views
Running AMD GPU Assembly
I am trying to run AMD GPU Assembly on my PC. I am using Ubuntu 12.04 64-bit and Windows 7 Ultimate. I am using 6XX GPU. Please tell me how to run it. A good resource links is also helpful. If you can ...
2
votes
1answer
41 views
CUDA: How does Thrust manage memory when using a Comparator in a sorting function?
I have a char array of 10 characters that I would like to pass as an argument to a comparator which will be used by Thrust's sorting function.
In order to allocate memory for this array I use ...
-2
votes
0answers
33 views
CUDA: tridiagonalization algorithm giving wrong results after a few iterations; see anything wrong? [closed]
I am trying to parallelize the tridiagonalization of a matrix from Numerical Recipes in C and comparing the answers (and eventually the computation speed) of different matrix sizes. I have run into a ...
0
votes
0answers
47 views
CUDA: sum-reduction — data lost in call to device function
I am writing a CUDA sum reduction code taking the sum of the absolute values of an array starting on element begin_index through end_index (I am using one block with a variable number of threads). ...
2
votes
1answer
52 views
How to choose a non busy CUDA device?
I'm working on a cluster with a lot of nodes, and each node has two gpus. In the cluster, I can't launch "nvidia-smi" to check which device is busy. My code selects the best device (with ...
1
vote
1answer
52 views
CUDA: Allocate memory for auxiliary data to the shared memory of each block efficiently
Suppose that we have an array int * data
each thread will access one element of this array. Since this array will be shared among all threads it will be saved inside the global memory.
Let's create ...