Take the 2-minute tour ×
Game Development Stack Exchange is a question and answer site for professional and independent game developers. It's 100% free, no registration required.

I'm trying to use an uniform array of matrices in my compute shader. However, it's really slow. I've narrowed it down to this lines of code:

uniform mat4 someMatrixArray[64]; 
...
vec4 result = vec4(0);
for (int i = 0; i < 2048; i++) {
    result += someMatrixArray[i%64][0] * 0.01;
}

When not accessing someMatrixArray I get 700+ fps. When accessing it, I have 10 fps. Anybody has an idea what could cause this? Could it be a driver issue?

I already tried to unroll the loops (via #pragma optionNV (unroll all)) but that didn't help. I'm using a GTX670 with the latest drivers on windows 7.

Edit:

The generated assembly for the inner loop is:

MAD.F R0.xyz, c[196], {0.0099999998, 0, 0, 0}.x, R0;
share|improve this question
    
Seems like the value of result can be determined statically without needing to be evaluated in the shader. Why not calculate it on the CPU and pass in the final value of result to the shader as a uniform? –  Nicolas Louis Guillemot Jun 26 at 23:21
    
This is only my test case :) my actual code looks different (I'm trying to implement tiled deferred shading) –  TobSpr Jun 27 at 5:21
    
try integer division instead of float multiplication and see what changes you get: result += someMatrixArray[i%64][0] / 100; –  Blue Jul 10 at 15:33
    
Do you get the same result if you replace the matrix array with a single matrix? Could this simply be that looping 2048 times per pixel is expensive? –  Sergio 4 hours ago
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.