Vulkan graphics rendering is organized into render passes and subpasses. This article provides an introduction to these concepts and how to use them in the Vulkan API. If you haven’t done so, it is recommended that you read the article on ‘GPU Framebuffer Memory’ before reading this article.
Render Passes
When a GPU renders a scene, it is configured with one or more render targets, or framebuffer attachments in Khronos terminology. The size and format of the attachments determine how graphics work is configured across the parallelism available on all modern GPUs. For example, on a tile-based renderer, the set of attachments is used to determine the way the image is divided into tiles. In Vulkan, a render pass is the set of attachments, the way they are used, and the rendering work that is performed using them. In a traditional API, a change to a new render pass might correspond to binding a new framebuffer.
Subpasses
During normal rendering, it is not possible for a fragment shader to access the attachments to which it is currently rendering: GPUs have optimized hardware for writing to the attachments, and accessing the attachment interferes with this. However, some common rendering techniques such as deferred shading rely on being able to access the result of previous rendering during shading. For a tile-based renderer, the results of previous rendering can efficiently stay on-chip if subsequent rendering operations are at the same resolution, and if only the data in the pixel currently being rendered is needed (accessing different pixels may require access to values outside the current tile, which breaks this optimization). In order to help optimize deferred shading on tile-based renderers, Vulkan splits the rendering operations of a render pass into subpasses. All subpasses in a render pass share the same resolution and tile arrangement, and as a result, they can access the results of previous subpass.
In Vulkan, a render pass consists of one or more subpasses; for simple rendering operations, there may be only a single subpass in a render pass.
Creating a VkRenderPass
In Vulkan, a render pass is described by an (opaque) VkRenderPass object. This provides a template that is used when beginning a render pass inside a command buffer. The render pass is used with a compatible VkFrameBuffer object, which represents the set of images that will be used as attachments during execution of the render pass.
vkCreateRenderPass
Like many driver objects in Vulkan, a VkRenderPass object is created with a corresponding create function, VkCreateRenderPass():
Logical device used for rendering (from vkCreateDevice)
const VkRenderPassCreateInfo*
pCreateInfo
Parameters for creation
const VkAllocationCallbacks*
pAllocator
Host memory allocation callback (can be NULL)
VkRenderPass*
pRenderPass
Resulting render pass handle
As with many Vulkan creation functions, most parameters are passed through a creation structure. This approach makes it more efficient to create multiple identical objects, and provides a way to support type-safe additional parameters through extensions.
Many creation methods in Vulkan offer a call-back for applications which wish to track host-side memory usage. While important for applications that wish to have precise control over resource allocation, and useful for debugging, in most cases this callback can be left as NULL to rely on the driver's default memory allocation scheme.
As with other Vulkan creation functions, the function returns an error code if anything goes wrong - although more information may be available through validation layers if the problem is an application error. The newly-created render pass description is returned via the pRenderPass pointer.
The interesting parameters are contained in the pCreateInfo structure.
Used for type safety and extensions, must be VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO
const void*
pNext
Allows extensions to provide extra parameters, must be NULL if not needed by an extension
VkRenderPassCreateFlags
flags
Reserved for future use (should be 0)
uint32_t
attachmentCount
Number of framebuffer attachments used in this render pass (across all subpasses)
const VkAttachmentDescription*
pAttachments
Description of the attachments (array of size attachmentCount)
uint32_t
subpassCount
Number of subpasses
const VkSubpassDescription*
pSubpasses
Description of subpasses (array of size subpassCount)
uint32_t
dependencyCount
Number of dependencies between subpass pairs
const VkSubpassDependency*
pDependencies
Descriptions of dependencies between subpasses (array of size dependencyCount)
For the purposes of this article, we will begin with a simple rendering operation with only a single subpass (a render pass always consists of at least one subpass). In this case, subpassCount can be 1 and dependencyCount can be 0 (so pDependencies can be NULL - we'll come back to describe how else dependencies are used below).
VkAttachmentDescription
An attachment corresponds to a single Vulkan VkImageView. A description of the attachment is provided to the render pass creation, which allows the render pass to be configured appropriately; the actual images to be used are provided when the render pass is used, via the VkFrameBuffer. It is possible to associate multiple attachments with a render pass; these may be used for example as multiple render targets, or in separate subpasses. More commonly, a color framebuffer and a depth buffer are separate attachments in Vulkan. Therefore the pAttachments member of VkRenderPassCreateInfo points to an array of attachmentCount elements.
Number of samples in the attachment (used for multi-sampling)
VkAttachmentLoadOp
loadOp
What should be done to access the attachment before rendering
VkAttachmentStoreOp
storeOp
What should be done with the attachment after rendering
VkAttachmentLoadOp
stencilLoadOp
In the case of a depth/stencil attachment, how to access the stencil contents before rendering
VkAttachmentStoreOp
stencilStoreOp
In the case of a depth/stencil attachment, what should be done with the stencil after rendering
VkImageLayout
initialLayout
Layout of the attachment when first used in the render pass
VkImageLayout
finalLayout
Layout of the attachment after use in the render pass
For a simple rendering operation, we might decide to create two attachments:
Color attachment (pAttachments[0])
Depth attachment (pAttachments[1])
flags
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
VK_IMAGE_FORMAT_D16_UNORM
format
0
0
samples
1
1
loadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
VK_ATTACHMENT_LOAD_OP_CLEAR
storeOp
VK_ATTACHMENT_STORE_OP_STORE
VK_ATTACHMENT_STORE_OP_DONT_CARE
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
VK_ATTACHMENT_LOAD_OP_DONT_CARE
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
VK_ATTACHMENT_STORE_OP_DONT_CARE
initialLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
finalLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
Stencil is special because the combined depth/stencil attachment is a single attachment. Here, we aren't using stencil, so the stencilLoadOp and stencilStoreOp are irrelevant. Note that a "DONT_CARE" store op doesn’t guarantee not to touch the memory, because while they may not access memory on a tile-based renderer, an immediate-mode renderer may actually use memory to implement them during rendering; similarly, a "DONT_CARE" load op avoids the need to read the previous frame buffer contents in a tiler, but also avoids the need to perform an explicit clear of the memory which may be costly for an immediate-mode renderer.
Note: We're assuming that the images have been transitioned from
VK_IMAGE_LAYOUT_UNDEFINED (on creation) to
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL and
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL before we use them, for example by using a VkImageMemoryBarrier.
There is a complication to this mechanism to be aware of: Consider the example of drawing a scene with two render passes, the second of which uses the results of the first (written with STORE_OP_STORE) as an input (LOAD_OP_LOAD) input attachment but does not write to it. If this input attachment is still wanted after the second render pass, it must still have STORE_OP_STORE associated with it: using STORE_OP_DONT_CARE causes some hardware to perform an optimization and discard the attachment content after the second render pass, even though the first render pass used STORE_OP_STORE. You may think of this as a cache discard of the output of the first render pass, where the cache line was previously considered to be valid. This is potentially a good performance enhancement, but it does mean that users need to be prepared for surprising behavior!
Array of input attachments read by this subpass (array of size inputAttachmentCount)
uint32_t
colorAttachmentCount
Number of output attachments for this subpass
const VkAttachmentReference*
pColorAttachments
Array of color attachments written to by this subpass (array of size colorAttachmentCount)
const VkAttachmentReference*
pResolveAttachments
Attachments for antialiasing (NULL or array of size colorAttachmentCount)
const VkAttachmentReference*
pDepthStencilAttachment
One attachment reference describing the depth/stencil attachment
uint32_t
preserveAttachmentCount
Number of attachments preserved across this subpass
const uint32_t*
pPreserveAttachments
Array of attachment indices preserved across this subpass, of size preserveAttachmentCount, or NULL
In our first example, we only have a single subpass, and we'll render to it directly. We won't use pResolveAttachments (so we can set it to NULL) and we do not need to preserve any attachments (so preserveAttachmentCount can be 0 and pPreserveAttachments can be NULL). The fields we don't need now will be described in more detail below, but in our simple case we can configure the (single) subpass. Before we get there, we have one more level of Vulkan object to worry about:
VK_ATTACHMENT_STORE_OP_DONT_CARE (Don't need depth after rendering)
In total, then, our simple render pass looks like this:
*pCreateInfo
sType
VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO
pNext
NULL
flags
0
attachmentCount
2
pAttachments
pAttachments[0]
flags
0
format
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
samples
1
loadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
storeOp
VK_ATTACHMENT_STORE_OP_STORE
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
initialLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
finalLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
pAttachments[1]
flags
0
format
VK_IMAGE_FORMAT_D16_UNORM
samples
1
loadOp
VK_ATTACHMENT_LOAD_OP_CLEAR
storeOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
initialLayout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
finalLayout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
subpassCount
1
pSubpasses
pSubpasses[0]
flags
0
pipelineBindPoint
VK_PIPELINE_BIND_POINT_GRAPHICS
inputAttachmentCount
0
pInputAttachments
NULL
colorAttachmentCount
1
pColorAttachments
pColorAttachments[0]
attachment
0
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
pResolveAttachments
NULL
pDepthStencilAttachment
*pDepthStencilAttachment
attachment
1
layout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
preserveAttachmentCount
0
pPreserveAttachments
NULL
dependencyCount
0
pDependencies
NULL
Fortunately, since render passes can be reused, you may not need to do this too often. We'll see later the flexibility exposed by this mechanism.
Creating a VkFrameBuffer
A VkRenderPass is a template for how a render pass will be used. When we use the render pass, we need to provide the actual images which are to be used for rendering. The mechanism containing references to the actual images is a VkFramebuffer, which contains all the attachments used by the render pass.
vkCreateFrameBuffer
As with vkCreateRenderPass for a vkRenderPass, a VkFramebuffer is created with vkCreateFramebuffer():
Used for type safety and extensions, must be VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO
const void*
pNext
Used for extensions; NULL if no extensions are used to add parameters
VkFramebufferCreateFlags
flags
Reserved for future use, must be 0
VkRenderPass
renderPass
The render pass (or a compatible one) with which the framebuffer will be used
uint32_t
attachmentCount
Number of attachments used in the render pass
const VkImageView*
pAttachments
Array of image views, which refer to actual images; array is of size attachmentCount
uint32_t
width
Width of framebuffer
uint32_t
height
Height of framebuffer
uint32_t
layers
Number of layers in framebuffer
Note that all the attachments used in the framebuffer are of the same width, height and number of layers - but that this is independent of the render pass, so the same render pass can be used with framebuffers of different sizes.
For our simple example, we need two image views: one referring to a VK_IMAGE_FORMAT_B8G8R8A8_UNORM image and one referring to a VK_IMAGE_FORMAT_D16_UNORM image. For efficiency, since we typically don't need the depth buffer to persist after rendering, the D16 image can be created with VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT in its usage flags, and can be bound to memory with the VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT set. In this case, a tile-based renderer may be able to avoid allocating any memory for the depth buffer, since it is only used for rendering operations which occur on-chip.
Using a VkRenderPass
Now that we have a VkRenderPass and a VkFramebuffer, we can use them in the rendering process.
vkCmdBeginRenderPass
To begin a render pass instance in a command buffer, call vkCmdBeginRenderPass():
Command buffer into which to insert the render pass
const VkRenderPassBeginInfo*
pRenderPassBegin
Arguments
VkSubpassContents
contents
Indication whether secondary command buffers are in use (see below)
A render pass can only begin (and end) in a primary command buffer.
Once a render pass has begun on a command buffer, subsequent commands submitted to that command buffer will execute within the first (and in the case of our example, only) subpass of the render pass instance. In our simple case, we could use just the one command buffer and record rendering commands directly into it. In this case, contents should be VK_SUBPASS_CONTENTS_INLINE.
VkRenderPassBeginInfo
As with many functions, Vulkan uses an info structure for reusability and extensibility.
Used for extensions; must be NULL if no extension is used which adds to this struct
VkRenderPass
renderPass
The render pass description created by vkCreateRenderPass()
VkFramebuffer
framebuffer
The framebuffer containing the images for rendering, created by vkCreateFramebuffer()
VkRect2D
renderArea
Bounds of the rectangular area affected by the render pass
uint32_t
clearValueCount
Number of clear values
const VkClearValue*
pClearValues
Values used for clearing attachments (array of size clearValueCount)
renderArea is used for rendering a subset of the framebuffer, for example for partial updates of dirty areas of the screen. The application is responsible for clipping rendering to this area, and rendering to less than the entire screen can invoke a performance hit if the area being drawn is not aligned as can be determined by vkGetRenderAreaGranularity() - which for a tile-based renderer might be expected to correspond to the alignment of the tile grid. For most purposes, the render area can be set to the full width and height of the framebuffer.
pClearValues is indexed by the attachment number and used if the attachment has a loadOp of VK_ATTACHMENT_LOAD_OP_CLEAR. In the case of our simple example, we clear the depth attachment at the start of rendering, and the depth attachment is at index 1 in our attachment array - so we need pClearValues[1] to represent the value to which we want to clear the depth buffer.
union
VkClearValue
VkClearColorValue
color
Value used when clearing color buffers
VkClearDepthStencilValue
depthStencil
Value used when clearing depth/stencil buffers
VkClearColorValue is a union of arrays of various channel types, with the format chosen by the attachment format being cleared. VkClearDepthStencilValue always has a float depth value, and a uint32_tstencil value. For our simple example, only the floatdepth value is relevant, and should be set to the depth value we want for our rendering.
vkCmdEndRenderPass
After the last rendering commands for the render pass instance have been submitted to the command buffer, the application must end the render pass instance:
In this example, if we have been recording commands direct to the primary command buffer, the command buffer looks like this:
Command buffer
Previous render pass...
Current render pass
vkCmdBeginRenderPass()
vkCmdBind*...
vkCmdDraw*... etc.
vkCmdEndRenderPass()
Next render pass...
Multiple render passes can be inserted into the same command buffer, so long as one is ended before the next is begun. A render pass must both begin and end within a single primary command buffer (that is, a render pass cannot span multiple primary command buffers), so parallelism in command buffer building in this approach relies on parallel building of multiple render passes. In many rendering frameworks, this level of parallelism is still enough to allow the CPU cores to stay busy, and simplifies the task of resource management and state tracking.
Render passes and secondary command buffers
In some rendering scenarios, a large amount of work needs to be performed within a single rendering pass. For example, a large number of characters may be managed and animated by their own threads, but all appear on screen at once. This complicates the task of optimizing rendering order and minimizing state changes, but can still be necessary in some highly-parallel systems.
Vulkan's solution to this is to make use of secondary command buffers, which (for graphics rendering) are executed inside a render pass. A secondary command buffer is created by vkAllocateCommandBuffers() using a VkCommandBufferAllocateInfo with a level member of VK_COMMAND_BUFFER_LEVEL_SECONDARY.
Beginning a secondary command buffer
For graphics, the VkCommandBufferBeginInfo argument of vkBeginCommandBuffer when creating a secondary command buffer must have a valid pInheritanceInfo field:
Set if the command buffer will only ever be used once (potential optimization)
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
Must be set for secondary graphics command buffers
VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT
Set if the command buffer will be submitted more than once concurrently (set only if needed when reusing command buffers)
Secondary command buffers can also be used for compute, and in this case their operations do not fall within a render pass. For graphics, we must set RENDER_PASS_CONTINUE_BIT, may be able to set ONE_TIME_SUBMIT_BIT, and may need to set SIMULTANEOUS_USE_BIT. These options affect the way secondary command buffers are implemented - for example, some may make the difference between whether a separate copy must be made of a secondary command buffer before use, or whether the existing copy may be used indirectly.
VkCommandBufferBeginInfo::pInheritanceInfo
pInheritanceInfo is used to allow the secondary command buffer to be configured correctly for the render pass:
Used for extensions, and should be NULL unless needed
VkRenderPass
renderPass
The render pass (or a compatible one) which will be active when the command buffer is used
uint32_t
subpass
The subpass of the render pass that this command buffer will be used in
VkFramebuffer
framebuffer
The framebuffer to be used (if known), or VK_NULL_HANDLE if unknown
VkBool32
occlusionQueryEnable
Should be VK_TRUE if the primary command buffer might have a query active, and VK_FALSE otherwise
VkQueryControlFlags
queryFlags
Queries that can be used in the primary command buffer when this secondary command buffer executes; 0 if unused
VkQueryPipelineStatisticFlags
pipelineStatistics
Set of pipeline statistics that can be counted by a query; 0 if pipeline statistics queries are disabled
If the framebuffer is known at the time the command buffer is recorded (for example, if the same framebuffer is always used for generating a shadow map) then providing an explicit framebuffer may be more efficient; otherwise (if the framebuffer argument is VK_NULL_HANDLE) the framebuffer is determined by the render pass in the primary command buffer, which allows secondary command buffers to be reused with different (compatible) framebuffers determined by the primary command buffer that is using the secondary command buffer.
Rendering commands are recorded into the secondary command buffer in the same way as for a primary command buffer, and having multiple secondary command buffers allows multiple threads to record rendering commands concurrently without need for synchronization.
Invoking a secondary command buffer
When the secondary command buffers have been recorded, they can be invoked in a "parent" primary command buffer with vkCmdExecuteCommands():
Array of commandBufferCount secondary command buffers to execute (in increasing array index order)
Secondary command buffers inside a subpass
Using the above techniques, work may be distributed as in the following example:
Thread 1
Record secondary command buffer A (frame 2)
Record secondary command buffer A (frame 3)
...
Thread 2
Record secondary command buffer B (frame 2)
Record secondary command buffer B (frame 3)
...
Thread 3
Record secondary command buffer C (frame 2)
Record secondary command buffer C (frame 3)
...
Thread 4
Primary command buffer (frame 1)
vkCmdBeginRenderPass()
vkCmdExecuteCommands(A)
vkCmdExecuteCommands(B)
vkCmdExecuteCommands(C)
vkCmdEndRenderPass()
Primary command buffer (frame 2)
vkCmdBeginRenderPass()
vkCmdExecuteCommands(A)
vkCmdExecuteCommands(B)
vkCmdExecuteCommands(C)
vkCmdEndRenderPass()
Primary command buffer (frame 3)
vkCmdBeginRenderPass()
vkCmdExecuteCommands(A)
vkCmdExecuteCommands(B)
vkCmdExecuteCommands(C)
vkCmdEndRenderPass()
Recording the primary command buffer should be faster than recording a significant amount of work into the secondary command buffers. However, there is typically some cost - especially for implementations which require the secondary command buffers to be copied into the primary command buffer. This approach also assumes that the secondary command buffers are at least double-buffered, and that the threads are suitably synchronized.
Since primary command buffers can be recorded in parallel and vkQueueSubmit() allows multiple command buffers to be submitted efficiently, exposing parallelism across secondary command buffers is not necessary in many applications, so this technique should be matched to the rendering work load. Note that it can also be possible to re-use secondary command buffers, although again this may carry some driver overhead (hopefully less than recording anew). Command buffer reuse should be used selectively, allowing for other optimizations such as frustum culling.
Destroying a VkRenderPass
Once a render pass is no longer needed, it can be deleted as follows:
Note that it is up to the user to ensure that nothing is still rendering which referred to the render pass at the point vkDestroyRenderPass() is called - for example by using vkWaitForFences() with a VkFence handle previously passed to vkQueueSubmit().
Multi-sampling
Tiled rendering also provides a low-bandwidth way to implement antialiasing: we can render to the tiles normally, but average pixel values as part of the operation of writing the tile memory; this downsampling step is known as "resolving" the tile buffer.
Vulkan has the concept of a number of samples associated with an image. In a simple implementation the image might have several values stored at each pixel location; more complex implementations have compressed schemes. Therefore an image has a number of samples associated with it at image creation time. For multi-sampled rendering in Vulkan, the multi-sampled image is treated separately from the final single-sampled image; this provides separate control over what values need to reach memory, since - like the depth buffer - the multi-sampled image may only need to be accessed during the processing of a tile. For this reason, if the multi-sampled image is not required after the render pass, it can be created with VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT and bound to an allocation created with VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT, as described above for depth buffers. The multi-sampled attachment storeOp can then be set to VK_ATTACHMENT_STORE_OP_DONT_CARE in the VkAttachmentDescription, so that (at least on tiled renderers) the full multi-sampled attachment does not need to be written to memory, which can save a lot of bandwidth.
To control multi-sampling, the index of an attached image view (in the pAttachments array of the VkFramebufferCreateInfo) with more than one sample should be used in the VkSubpassDescription's pColorAttachment array, and the index of a corresponding image view with exactly one sample should be placed in the corresponding index of the pResolveAttachments array; the multi-sampled image is then resolved to the single-sampled image at the end of the current sub-pass. To use pResolveAttachments for some attachments but not others, the entry in the pResolveAttachments array can be set to VK_ATTACHMENT_UNUSED to avoid resolving the corresponding multi-sampled image.
For example, if we had three multi-sampled attachments and only wanted the first and third to be resolved to single-sampled form, the VkSubpassDescription may have the following entries:
Index
pColorAttachments[]
pResolveAttachment[]
0
Index of first multi-sampled attachment
Index of first single-sampled attachment
1
Index of second multi-sampled attachment
VK_ATTACHMENT_UNUSED
2
Index of third multi-sampled attachment
Index of second single-sampled attachment
Remember that if we don't want to resolve any attachments in the subpass, pResolveAttachments can simply be set to NULL. Multi-sampled images can also be resolved to a single-sample image with vkCmdResolveImage() - but this happens outside the render pass and requires a separate access to memory, so it is a much less efficient solution if it can be avoided. Note that you can write both the resolved and multi-sampled images out of the same render pass by setting the storeOp of both attachments to VK_ATTACHMENT_STORE_OP_STORE.
Resolving an image outside a render pass
On some occasions, the attachment containing all samples may need to be written to memory for later processing (for example, use in a later render pass as an input attachment). It is possible to resolve a multi-sampled image to a single-sampled one without using it as an attachment in a render pass using the vkCmdResolveImage() command.
However, please bear in mind that this should be the exception to normal rendering, not the default approach. Writing out the multi-sampled attachment to off-chip memory (rather than using VK_ATTACHMENT_STORE_OP_DONT_CARE) has a high bandwidth cost, and vkCmdResolveImage() itself must then read all this data back, process it, and write the single-sampled output. It is very much more efficient to perform resolve operations inside a render pass where possible.
Multiple subpasses
The render pass mechanism described so far is quite verbose for use with a single subpass. The reason for this is the flexibility that it provides when when using multiple subpasses.
Some rendering techniques, notably deferred shading and deferred lighting, traverse the scene geometry once to create a frame buffer, then use the rendering results in the framebuffer for further rendering operations. The same can be said for, for example, applying tone mapping effects after rendering. In a tiled renderer, because each of these operations requires access only to the current pixel and not the entire framebuffer, all of these operations can be performed consecutively on a per-tile basis, avoiding the need to write intermediate values out to memory. This can provide a significant bandwidth (and therefore power and performance) improvement. There is a graphical example of how deferred shading is evaluated on a tiler towards then end of the Understanding Tiling article.
Note that because the render area size is defined by the width and height fields of the VkFrameBufferCreateInfo object, the render area of each attachment is effectively the same size, and this is true for all subpasses in a render pass. If a rendering technique requires reading values outside the current fragment area (which on a tiler would mean accessing rendered data outside the currently-rendering tile), separate render passes must be used.
Taking the example of deferred lighting, we might render the scene in three "subpasses":
The first subpass renders the geometry and stores the depth, normal vector and specular spread function.
The second subpass renders each light's bounds, accumulating a specular and diffuse color for each light that is calculated with the position, normal and specular spread function from the first subpass.
Finally, the scene geometry is processed again with conventional forward shading, picking up the light contributions from the results of the second subpass.
Since the shading in the first subpass is highly simplistic, the shader run-time cost can be significantly reduced in this approach, although the degree of shader parallelism in the final subpass may still depend on fragment coverage. The related deferred shading technique can allow for better shader parallelism at the cost of reduced flexibility and increasing intermediate storage requirements.
Multiple attachments for multiple subpasses
In our deferred lighting example, the depth buffer is used in all three subpasses; it should only be updated by the first, but the lighting subpass needs the depth attachment both to provide an accurate bounds for a light and to calculate the shading position in world space, and the final rendering pass can inherit the depth buffer to avoid unnecessary overdraw.
In this case, our render pass might use the following attachments:
ID
Field
Value
Notes
0
flags
0
Reserved
samples
1
Single-sampled
format
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
loadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
Assuming this will be completely overwritten
storeOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Intermediate storage (not written)
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
Unused
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Unused
initialLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
finalLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
1
flags
0
Reserved
samples
1
Single-sampled
format
VK_IMAGE_FORMAT_D16_UNORM
Depth
loadOp
VK_ATTACHMENT_LOAD_OP_CLEAR
Need empty depth buffer before use
storeOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Intermediate storage (not written)
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
Unused
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Unused
initialLayout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
Rendering to it
finalLayout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
Rendering to it
2
flags
0
Reserved
samples
1
Single-sampled
format
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
Accumulated diffuse lighting contribution
loadOp
VK_ATTACHMENT_LOAD_OP_CLEAR
Accumulating, so start at 0
storeOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Intermediate storage (not written)
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
Unused
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Unused
initialLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
finalLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
3
flags
0
Reserved
samples
1
Single-sampled
format
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
Accumulated specular lighting contribution
loadOp
VK_ATTACHMENT_LOAD_OP_CLEAR
Accumulating, so start with 0
storeOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Intermediate storage (not written)
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
Unused
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Unused
initialLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
finalLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
4
flags
0
Reserved
samples
1
Single-sampled
format
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
Final output of rendering
loadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
Assuming rendering the whole frame
storeOp
VK_ATTACHMENT_STORE_OP_STORE
Write output of rendering
stencilLoadOp
VK_ATTACHMENT_LOAD_OP_DONT_CARE
Unused
stencilStoreOp
VK_ATTACHMENT_STORE_OP_DONT_CARE
Unused
initialLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
finalLayout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Rendering to it
That is:
Attachment 0 holds the surface normal and specular factor output by the first subpass, and used by the second subpass.
Attachment 1 holds the depth buffer for the scene, and applies to all three subpasses.
Attachment 2 holds the diffuse contributions from light sources output by the second subpass and read by the third.
Attachment 3 holds the specular contributions from light sources output by the second subpass and read by the third.
Attachment 4 holds the final result of rendering generated by the third subpass.
Relating attachments to subpasses
To associate the way these attachments are used with each subpass, we need a more complex array of VkSubpassDescription objects to pass to the pSubpasses member of our VkRenderPassCreateInfo object:
pSubpasses[0]
flags
0
pipelineBindPoint
VK_PIPELINE_BIND_POINT_GRAPHICS
inputAttachmentCount
0
pInputAttachments
NULL
colorAttachmentCount
1
pColorAttachments
pColorAttachments[0]
attachment
0 (normal + specularity)
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
pResolveAttachments
NULL
pDepthStencilAttachment
*pDepthStencilAttachment
attachment
1 (depth)
layout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
preserveAttachmentCount
0
pPreserveAttachments
NULL
pSubpasses[1]
flags
0
pipelineBindPoint
VK_PIPELINE_BIND_POINT_GRAPHICS
inputAttachmentCount
1
pInputAttachments
pInputAttachments[0]
attachment
0 (normal + specularity)
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
colorAttachmentCount
2
pColorAttachments
pColorAttachments[0]
attachment
2 (diffuse lighting)
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
pColorAttachments[1]
attachment
3 (specular lighting)
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
pResolveAttachments
NULL
pDepthStencilAttachment
*pDepthStencilAttachment
attachment
1 (depth)
layout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
preserveAttachmentCount
0
pPreserveAttachments
NULL
pSubpasses[2]
flags
0
pipelineBindPoint
VK_PIPELINE_BIND_POINT_GRAPHICS
inputAttachmentCount
2
pInputAttachments
pInputAttachments[0]
attachment
2 (diffuse lighting)
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
pInputAttachments[1]
attachment
3 (specular lighting)
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
colorAttachmentCount
1
pColorAttachments
pColorAttachments[0]
attachment
4 (final output)
layout
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
pResolveAttachments
NULL
pDepthStencilAttachment
*pDepthStencilAttachment
attachment
1 (depth)
layout
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
preserveAttachmentCount
0
pPreserveAttachments
NULL
Since all but the final output color attachment in this example are used only as intermediate values, they can be created with the VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT set, and be bound to memory allocated with VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT. Tiling hardware typically has limitations on the number and type of attachments which can be kept in flight concurrently, so despite this optimization, it is possible that implementations will have to spill intermediate results to main memory.
More complex arrangements of subpasses are possible. If an attachment is not used during a subpass, but is needed in previous and subsequent subpasses, the attachment should appear in the pPreserveAttachments array of the subpass. Implementations can change the order in which subpasses are evaluated (while preserving dependencies) in order to reduce the need for spilling. In the above example, attachment 0 is not preserved, and the implementation may use the same internal tile memory for both it and the final output attachment. It is also possible to use multi-sampling with these approaches, but this complicates the intermediate read operations and may make it more likely that tilers will have to spill to external memory.
Subpass dependencies
When multiple subpasses are in use, the driver needs to be told the relationship between them. A subpass can depend on operations which were submitted outside the current render pass, or be the source on which later rendering depends. Most commonly, the need is to ensure that the fragment shader from an earlier subpass has completed rendering (to the current tile, on a tiler) before the next subpass starts to try to read that data. An array of subpass dependencies - if there are any - is passed to VkRenderPassCreateInfo, defining a set of dependencies between "source" (the thing being waited on) and "destination" (the thing doing the waiting). Each subpass dependency is defined as follows:
The index of the render pass being depended upon by dstSubpass
uint32_t
dstSubpass
The index of the render pass depending on srcSubpass
VkPipelineStageFlags
srcStageMask
What pipeline stage must have completed for the dependency
VkPipelineStageFlags
dstStageMask
What pipeline stage is waiting on the dependency
VkAccessFlagBits
srcAccessMask
What access scopes are influence the dependency
VkAccessFlagBits
dstAccessMask
What access scopes are waiting on the dependency
VkDependencyFlags
dependencyFlag
Other configuration about the dependency
Typically, for dependencies between fragment writes and fragment shader reads, we might expect the following settings:
srcStageMask
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
Fragment data has been written
dstStageMask
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
Don't start shading until data is available
srcAccessMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT
Waiting for color data to be written
dstAccessMask
VK_ACCESS_SHADER_READ_BIT
Don't read things from the shader before ready
dependencyFlag
VK_DEPENDENCY_BY_REGION_BIT
Only need the current fragment (or tile) synchronized, not the whole framebuffer
In the cases of our deferred lighting example, we have three subpasses, and we have dependencies between the first and second and between the second and third. That is, we need to set the dependencyCount member of our VkRenderPassCreateInfo to 2, and set the pDependencies member of our VkRenderPassCreateInfo to point to the following array:
pDependencies[0]
srcSubpass
0
dstSubpass
1
srcStageMask
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
dstStageMask
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
srcAccessMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT
dstAccessMask
VK_ACCESS_SHADER_READ_BIT
dependencyFlag
VK_DEPENDENCY_BY_REGION_BIT
pDependencies[1]
srcSubpass
1
dstSubpass
2
srcStageMask
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
dstStageMask
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
srcAccessMask
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT
dstAccessMask
VK_ACCESS_SHADER_READ_BIT
dependencyFlag
VK_DEPENDENCY_BY_REGION_BIT
Using subpasses in a command buffer
When recording to a VkCommandBuffer, we described above that vkCmdBeginRenderPass() and vkCmdEndRenderPass() are used to wrap the render pass operations. After vkCmdBeginRenderPass() is called, subsequent commands are applied to the first subpass within the render pass.
To move operations to subsequent subpasses, vkCmdNextSubpass() should be called. Each call of this function moves operations to the next subpass index, in increasing order, until vkCmdEndRenderPass() is called. Synchronization between access to attachments described in subpass dependencies is handled automatically.
Using subpasses in shaders
In SPIR-V, the contents of an input attachment can be accessed with the OpImageRead operation, with an OpTypeImage that has a dim argument of SubpassData. The coordinate argument of the OpImageRead must be (0,0), and corresponds to accessing the input attachment at the current fragment location. When multi-sampling, the sample operand to OpImageRead can be used to access separate samples at the current fragment.
In GLSL, this functionality is exposed through the subpassLoad() function, with subpassInput types for the subpasses.
Summary
The Vulkan API acknowledges the fact that modern rendering technique may perform multiple passes over the same image data, and is designed to ensure that these approaches are explicitly and efficiently supported on modern graphics hardware. The unfortunate consequence of this expressivity is the complexity of the description and the verbosity of simple examples, although the overhead in a practical, optimized renderer should be less significant.
In Vulkan, the render pass is an explicit concept within which rendering operations execute. A VkFrameBuffer, with a list of associated attachments, is associated with the render pass when rendering work is recorded into a VkCommandBuffer. The render pass is divided into one or more subpasses, with explicitly-defined interactions between them. This explicit configuration VkRenderPass object can be shared between rendering operations, which can limit the impact on real-world, complex applications. Providing this additional information to a driver can allow significantly improved memory overhead, especially on tiled architectures, without the unpredictability of the heuristics applied to achieve good performance in more traditional APIs.
Additional reading
A simplified version of the content of this article may be found in a presentation on the subject at a UK developer event.
Manage Your Cookies
We use cookies to improve your experience on our website and to show you relevant
advertising. Manage you settings for our cookies below.
Essential Cookies
These cookies are essential as they enable you to move around the website. This
category cannot be disabled.
Company
Domain
Samsung Electronics
.samsungdeveloperconference.com
Analytical/Performance Cookies
These cookies collect information about how you use our website. for example which
pages you visit most often. All information these cookies collect is used to improve
how the website works.
Company
Domain
LinkedIn
.linkedin.com
Meta (formerly Facebook)
.samsungdeveloperconference.com
Google Inc.
.samsungdeveloperconference.com
Functionality Cookies
These cookies allow our website to remember choices you make (such as your user name, language or the region your are in) and
tailor the website to provide enhanced features and content for you.
Company
Domain
LinkedIn
.ads.linkedin.com, .linkedin.com
Advertising Cookies
These cookies gather information about your browser habits. They remember that
you've visited our website and share this information with other organizations such
as advertisers.
Company
Domain
LinkedIn
.linkedin.com
Meta (formerly Facebook)
.samsungdeveloperconference.com
Google Inc.
.samsungdeveloperconference.com
Preferences Submitted
You have successfully updated your cookie preferences.