Optimizing descriptor sets and layouts for Vulkan
Midgard and Bifrost family Mali GPUs support four simultaneous bound descriptor sets at the API level. However, they require a single physical descriptor table per draw call internally.
You must understand the following concepts:
- Descriptor sets.
- Binding spaces.
Descriptor sets and layouts
If any of the four source descriptor sets have changed, then the driver rebuilds the internal table for a draw call.
The first draw call, after a descriptor changes, has a higher CPU overhead than following draw calls that reuse the same descriptor set. Larger descriptor sets cause a more expensive rebuild.
With current drivers, the descriptor set pool allocations are not pooled. Do not call vkAllocateDescriptorSets() on a performance critical code path.
How to optimize descriptor sets and layouts
Try using the following optimization steps:
- Pack the descriptor set binding space as much as possible.
- Instead of resetting descriptor pools and reallocating new descriptor sets, update descriptor sets that are already allocated, but no longer referenced.
- Reuse pre-allocated descriptor sets and do not update them with the same information every time.
- Use VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC or VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC to bind the same UBO or SSBO with different offsets. The alternative is building more descriptor sets.
Descriptor set and layout steps to avoid
Arm recommends that you:
- Do not leave holes in the descriptor set.
- Do not leave unused entries as copying and merging still has a computational cost.
- Do not allocate descriptor sets from descriptor pools on performance critical code paths.
- Do not use DYNAMIC_OFFSET UBOs/SSBOs if you never plan on changing the binding offset, as there is a small, extra, cost for handling the dynamic offset.
The negative impact of inefficient descriptor sets and layout
Unoptimized Vulkan descriptor sets and layout leads to a risk of increased CPU load for draw calls.
Debugging your descriptor sets and layout performance problems
Ways to speed up your debugging process:
- Monitor the pipeline layout for unused entries.
- Monitor for contention-related performance problems on vkAllocateDescriptorSets().