I have an AMD graphics card, and I want to accelerate Blender using OpenCL. However, Blender does not support the use of OpenCL acceleration with AMD cards. What is the technical explanation for this, and will there be a fix?
|
NVidia hardware and compilers support true function calls, which is important for complex kernels. It seems that AMD hardware or compilers do not support them, or not to the same extent. To see why this is a problem, consider this example. There are 5 places where the shading nodes are executed, and there are 20 places in the shading nodes where perlin noise is used. Because no true function calls are supported, the compiler must copy the perlin noise code 100x. You can see how this would make the final code size blow up and cause issues for the compiler. Note that V-Ray RT at this time also does not support running their full OpenCL kernel on AMD (only an older and simpler version), and Luxrender with OpenCL is also running into kernel size issues when adding more features. So that's a good indication kernel size is the main issue here. |
|||
|
Compiling of the kernels fails due to extreme memory requirements. The Cycles kernels are rather big, and the AMD compilers are unable to cope with that. There are two places where changes can be done: the compiler could be improved or cycles code could be reorganised. Both are huge projects and not very likely to happen. So unfortunately and sadly we'll have to suffer vendor-lock for Cycles for the time being. You can always look into GPU accelerated renderers that don't suffer this, like Luxrender. |
|||||||||
|
I have been looking into this a bit and I found the answer after looking through the source code for a part of the AMD OpenCL compiler. AMD open sourced a part of their opencl compiler in 2012. OpenCL translates the Cycles data into an LLVM intermediate representation, and then into AMD's internal language. During this step, it targets a specific fake-forward compatible GPU-like architecture with a fixed amount of registers. The problem arises because the code OpenCL uses to do this does not include the ability to spill registers. Cycles due to its complexity requires more registers at this point in the translation process but the function that translates aborts with Interestingly enough this same code has the following lines where it finds out it is out of registers:
This does not mean that adding the support will make Cycles work but this is the current road block people hit when they try to compile with the latest driver catalyst 13.01 and higher. The page with the commit |
|||||||||||
|