Efficient render passes with OpenGL ES
Tile-based rendering operates on the concept of render passes. Each render pass has an explicit start and end and produces an output in memory only at the end of the pass.
You must understand the following concepts:
- OpenGL ES rendering APIs.
- Render passes.
- Tile-based rendering.
Render pass handling
At the start of the pass, the tile memory is initialized inside the GPU. At the end of the pass, the required outputs are written back to system memory. The intermediate framebuffer working state lives entirely inside the tile memory.
Efficient render passes
The driver infers the OpenGL ES render passes based on framebuffer binding
calls, as they are not explicit in the API. A render pass for the framebuffer starts
when it is bound as the
and normally ends when the draw framebuffer binding changes to another
How to optimize render passes
Try using the following optimization steps:
- When starting a render pass, clear or invalidate every attachment. This does not apply if the content of a render target is used as the starting point for rendering.
- To clear the tile memory quickly, clear the entire content of the attachment, ensuring that the color, depth, or stencil writes are not masked.
- At the end of the render pass, invalidate any attachments that are not needed outside of the pass, before changing the framebuffer binding to the next FBO.
- For rendering to a subregion of framebuffer, use a scissor box to restrict the area of clearing and rendering required.
Render pass steps to avoid
Arm recommends that you:
- Do not switch back and render to the same FBO multiple times in a frame. Complete each of your render passes in a single glBindFramebuffer() call before moving on to the next.
- Do not split a render pass by using either glFlush() or glFinish().
- Do not create a packed depth-stencil texture,
D32F_S8, and only attach one of the two components as an attachment.
Negative impacts of inefficient render passes
Incorrect handling of render passes causes worse fragment shading performance and increased memory bandwidth. Therefore, avoid lowering fragment shading performance and increasing memory bandwidth. At the start of rendering, read non-cleared attachments into tile memory, and then write out non-invalidated attachments at the end of rendering.
Debugging render pass issues
Review your API usage of framebuffer binding, clears, draws, and invalidates.