Why is Arrayfun much faster than a for-loop when using GPU?

Question

Could someone tell why Arrayfun is much faster than a for loop on GPU? (not on CPU, actually a For loop is faster on CPU)

Arrayfun:

x = parallel.gpu.GPUArray(rand(512,512,64));
count = arrayfun(@(x) x^2, x);

And equivalent For loop:

for i=1:size(x,1)*size(x,2)*size(x,3)
  z(i)=x(i).^2;        
end

Is it probably because a For loop is not multithreaded on GPU? Thanks.

Is the z(i) array preallocated? Also, just curious, what GPU are using (e.g. NVIDIA GTX680, or some other model number)? — solvingPuzzles, Feb 9 '13 at 19:30

dcow · Accepted Answer · 2012-07-03 05:44:12Z

up vote 3 down vote accepted

I don't think your loops are equivalent. It seems you're squaring every element in an array with your CPU implementation, but performing some sort of count for arrayfun.

Regardless, I think the explanation you're looking for is as follows:

When run on the GPU, you code can be functionally decomposed -- into each array cell in this case -- and squared separately. This is okay because for a given i, the value of [cell_i]^2 doesn't depend on any of the other values in other cells. What most likely happens is the array get's decomposed into S buffers where S is the number of stream processing units your GPU has. Each unit then computes the square of the data in each cell of its buffer. The result is copied back to the original array and the result is returned to count.

Now don't worry, if you're counting things as it seems *array_fun* is actually doing, a similar thing is happening. The algorithm most likely partitions the array off into similar buffers, and, instead of squaring each cell, add the values together. You can think of the result of this first step as a smaller array which the same process can be applied to recursively to count the new sums.

edited Jul 3 '12 at 5:44

answered Apr 14 '12 at 4:44

dcow

4,24111848

Thanks David. very clear. Probably this is what's happening in part, but I guess there must be something else too. On GPU the speed gain between Arrayfun and a For loop is >100x. The GPU I am using does only have 16 compute units (with 32 cores each) ... I will do some more testing. – Maiss Apr 14 '12 at 5:03

Keep in mind, your GPU is not identical to your processor either. In short, your GPU is optimized for certain types of calculations (such as floating point operations and very quick integer arithmetic). Further, the memory timing is faster on most current GPUs. The speed increase may not only be attributed to the parallelism, but to the relative locality of the GPU memory. You can do horrible things to the running time of array walking algorithms on a CPU if you access the elements in a bad (cache exploding) order. – dcow Apr 14 '12 at 5:13

1

@Maiss: I've heard that 100 to 1 number a lot for GPU/CPU difference, in the event that the GPU can be properly used. This is definitely one of those times. – PearsonArtPhoto Apr 14 '12 at 12:50

Actually it is~ 3000-4000x with (I7 950 and GTX 580) which I doubt it is the actual GPU/CPU difference. The problem must come from the way For/Arrayfun operate. I can tell that Arrayfun (the GPU implementation) is very private which mean that it may be distributed equally to all the multiprocessing units (16) and then to the cores (32 for each MP unit). But still doesn't explain the difference. Perhaps Arrayfun uses some low level C code. – Maiss Apr 14 '12 at 17:34

@Maiss The functions are almost certainly compiled to C code first. What we're saying is you shouldn't be so surprised at the difference. The's why there's so much buzz about GPU computing right now. – dcow Apr 14 '12 at 17:47

add a comment |

Edric · Answer 2 · 2012-04-16 08:24:45Z

As per the reference page here http://www.mathworks.co.uk/help/toolbox/distcomp/arrayfun.html, "the MATLAB function passed in for evaluation is compiled for the GPU, and then executed on the GPU". In the explicit for loop version, each operation is executed separately on the GPU, and this incurs overhead - the arrayfun version is one single GPU kernel invocation.

asked	3 years ago
viewed	2633 times
active	3 years ago

current community

your communities

more stack exchange communities

Why is Arrayfun much faster than a for-loop when using GPU?

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged matlab gpu-programming or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Why is Arrayfun much faster than a for-loop when using GPU?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged matlab gpu-programming or ask your own question.

Related

Hot Network Questions