Efficient render passes with Vulkan
Tile-based rendering operates on the concept of render passes. Each render pass has an explicit start and end and produces an output in memory only at the end of the pass.
You must understand the following concepts:
- Vulkan rendering APIs.
- Render passes.
- Tile-based rendering.
Render pass handling
At the start of the pass, the tile memory is initialized inside the GPU. At the end of the pass, the required outputs are written back to system memory. The intermediate framebuffer working state lives entirely inside the tile memory.
Efficient render passes
Unlike with OpenGL ES, Vulkan render passes are explicit in the API. There are defined loadOp and storeOp operations. loadOp defines how GPUs initialize the tile memory at the start of the pass. storeOp defines what is written back at the end of a pass.
Vulkan introduces lazily allocated memory, meaning that transient attachments existent during a single render pass do not need physical storage.
How to optimize render passes
Try using the following optimization steps:
- Clear or invalidate each attachment at the start of a render pass using
loadOp = LOAD_OP_CLEARor
loadOp = LOAD_OP_DONT_CARE.
- Set up any attachment that is only live during a single render pass as a TRANSIENT_ATTACHMENT that is backed by LAZILY_ALLOCATED memory.
- Ensure that the contents are invalidated at the end of the
render pass using
storeOp = STORE_OP_DONT_CARE.
Render pass steps to avoid
Arm recommends that you:
- Do not clear an attachment inside a render pass using vkCmdClearAttachments(). This is not free, unlike
a clear or invalidate
- Do not write a constant color using a shader program to manually clear a render pass.
- Do not use
loadOp = LOAD_OP_LOADunless your algorithm relies on the initial framebuffer state.
- Do not set
storeOpfor attachments that are not needed in the render pass to avoid generating a needless round trip through the tile-memory for that attachment.
- Do not use
vkCmdBlitImageas a way of upscaling a low-resolution game frame to native resolution. Especially if you render the UI or HUD directly on top of the frame with
loadOp = LOAD_OP_LOAD, as this is an unnecessary round trip to memory.
Negative impacts of inefficient render passes
Incorrect handling of render passes causes worse fragment shading performance and increased memory bandwidth.
Debugging render pass issues
Review the API usage of render pass creation and any use of vkCmdClearColorImage(), vkCmdClearDepthStencilImage(), and vkCmdClearAttachments().