Branching is expensive on a GPU. Branching either restricts how the compiler can pack groups of instructions within a thread. Or, when there is divergence across multiple threads, introduces cross-thread scheduling restrictions.
You must understand the following concepts:
- Thread scheduling.
How to optimize the use of branches
Try using the following optimizations:
- Minimize the use of complex branches in shader programs.
- Minimize the amount of control-flow divergence in spatially adjacent shader threads.
- Use min(), max(), clamp(), and mix() functions to avoid small branches.
- Check the benefits of branching over computation. For example, skipping lights that are above a threshold distance from the camera. Often, it is faster just doing the computation.
Things to avoid when optimizing your use of branches
Do not implement multiple expensive data paths that are selected from using a mix(). Branching is usually the best solution for minimizing the overall cost in this particular scenario.
The negative impact of not using branches correctly
You can expect to experience a reduced performance in your application due to less efficient shader programs.
How to debug branch-related performance issues
Use the Mali Offline Compiler to measure the impact of your shader code changes. Include an analysis of shortest and longest path through the programs. https://developer.arm.com/tools-and-software/graphics-and-gaming/mali-offline-compiler