You copied the Doc URL to your clipboard.

Efficient render passes with Vulkan

Tile-based rendering operates on the concept of render passes. Each render pass has an explicit start and end and produces an output in memory only at the end of the pass.


You must understand the following concepts:

  • Vulkan rendering APIs.
  • Render passes.
  • Tile-based rendering.

Render pass handling

At the start of the pass, the tile memory is initialized inside the GPU. At the end of the pass, the required outputs are written back to system memory. The intermediate framebuffer working state lives entirely inside the tile memory.

Efficient render passes

Unlike with OpenGL ES, Vulkan render passes are explicit in the API. There are defined loadOp and storeOp operations. loadOp defines how GPUs initialize the tile memory at the start of the pass. storeOp defines what is written back at the end of a pass.

Vulkan introduces lazily allocated memory, meaning that transient attachments existent during a single render pass do not need physical storage.

How to optimize render passes

Try using the following optimization steps:

  • Clear or invalidate each attachment at the start of a render pass using loadOp = LOAD_OP_CLEAR or loadOp = LOAD_OP_DONT_CARE.
  • Set up any attachment that is only live during a single render pass as a TRANSIENT_ATTACHMENT that is backed by LAZILY_ALLOCATED memory.
  • Ensure that the contents are invalidated at the end of the render pass using storeOp = STORE_OP_DONT_CARE.

Render pass steps to avoid

Arm recommends that you:

  • Do not clear an attachment inside a render pass using vkCmdClearAttachments(). This is not free, unlike a clear or invalidate loadOp operation.
  • Do not write a constant color using a shader program to manually clear a render pass.
  • Do not use loadOp = LOAD_OP_LOAD unless your algorithm relies on the initial framebuffer state.
  • Do not set loadOp or storeOp for attachments that are not needed in the render pass to avoid generating a needless round trip through the tile-memory for that attachment.
  • Do not use vkCmdBlitImage as a way of upscaling a low-resolution game frame to native resolution. Especially if you render the UI or HUD directly on top of the frame with loadOp = LOAD_OP_LOAD, as this is an unnecessary round trip to memory.

Negative impacts of inefficient render passes

Incorrect handling of render passes causes worse fragment shading performance and increased memory bandwidth.

Debugging render pass issues

Review the API usage of render pass creation and any use of vkCmdClearColorImage(), vkCmdClearDepthStencilImage(), and vkCmdClearAttachments().