One common use case for compute shaders is for image post-processing effects. Remember however, that fragment shaders have access to many fixed-function features in the hardware. Such features can speed up things, reduce power, and even reduce bandwidth.
You must understand the following concepts:
- Fragment shading.
- Pipeline bubbles.
Here are some advantages to using fragment shading for image processing:
- Texture coordinates are interpolated using fixed function hardware when using varying interpolation. In turn, freeing up shader cycles for more useful workloads.
- Write out to memory can be done using the tile-writeback hardware in parallel to shader code.
- There is no need to range check imageStore() coordinates. Doing so can be a problem when you are using workgroups that do not subdivide a frame completely.
- It is possible to do framebuffer compression and transaction elimination.
Here are some advantages to using compute for image processing:
- It can be possible to exploit shared data sets between neighboring pixels. Doing so avoids extra passes for some algorithms.
- It is easier to work with larger working sets per thread, avoiding extra passes for some algorithms.
- For complicated algorithms such as FFTs, which need multiple fragment render passes, it is often possible to merge into a single compute dispatch.
How to optimize the use of image processing
Try using the following optimization steps:
- Use fragment shaders for simple image processing.
- For more complicated scenarios try using, and monitoring the performance of, compute shaders.
- Use texture() instead of imageLoad() for reading read-only texture data. texture() works with Arm Frame Buffer Compression (AFBC) textures that have been rendered by previous fragment passes. Using texture() also load balances the GPU pipelines better because texture() operations use the texture unit and both imageLoad() and imageStore() use the load or store unit. The load-store units are often already being used in compute shaders for generic memory accesses.
Things to avoid when optimizing your image processing implementation
Arm recommends that you:
- Do not use imageLoad() in compute unless you must use coherent read and writes within the dispatch.
- Do not use compute to process images that were produced by fragment shading. Doing so creates a backwards dependency that can cause a bubble. If fragment shader outputs are consumed by fragment shaders of later render passes, then render passes go through the pipeline more cleanly.
The negative impact of not using the correct image processing method
Compute shaders can be slower and less energy-efficient than fragment shaders for simple post-processing workloads. Examples of such workloads include: downscaling, upscaling, and blurs.