D3D12 AV1 video encoding

The Direct3D12 video encoding feature is extended to support AV1 encoding starting in Windows 11, version 24H2 (WDDM 3.2). This article describes the points of extension where the existing D3D12 Video Encode DDI needs modifications and new structures to support AV1 encoding. For more information, including application-level specifics, see the AV1 D3D12 Video Encoding Specification.

Extensions to rate control

The following existing enumerations are updated with extensions to rate control and rate control support:

When D3D12DDI_VIDEO_ENCODER_RATE_CONTROL_FLAG_0096_ENABLE_EXTENSION1_SUPPORT is enabled, the extended rate control structures are used in D3D12DDI_VIDEO_ENCODER_RATE_CONTROL_CONFIGURATION_PARAMS_0080_2.pConfiguration_XXX; otherwise the legacy structures are used when disabled per the table documented on the D3D12DDI_VIDEO_ENCODER_RATE_CONTROL_FLAGS_0080 reference page.

Video encoding support extensions

The existing video-related framework is extended to allow drivers to report AV1 video encoding support and capabilities. This section lists the added or updated structures and enumerations that are used to query and report AV1 video encoding support.

Encoding operation

Expected bitstream header values for AV1

Driver/host header coding responsibilities

Given an encoded frame with K tiles, the driver writes the K decode_tile() AV1 syntax elements in the compressed bitstream, corresponding to the requested tiles in EncodeFrame arguments.

The API Client then builds the tile_group_obu() AV1 syntax elements with tile_start_and_end_present_flag/tg_start/tg_end elements to arrange the tiles into tile groups as desired with the condition that the tiles are placed sequentially. The tile_size_minus_1 element is coded from the related tile D3D12_VIDEO_ENCODER_FRAME_SUBREGION_METADATA information and decode_tile() elements are copied from the compressed bitstream buffer. Finally, each tile_group_obu() is wrapped around open_bitstream_unit() elements of type OBU_TILE_GROUP and prepended with an OBU_FRAME_HEADER. For a single tile group, an OBU_FRAME type can be used instead.

The API Client is responsible for inferring obu_extension_flag as !(TemporalLayerIndexPlus1 || SpatialLayerIndexPlus1) for the current frame and also code if necessary temporal_id and spatial_id in the open_bitstream_unit().

The EncodeFrame submissions are in encode order, like the other codecs implemented in the D3D12 Encode API.

Resolution changes and spatial scalability

If the driver reports D3D12_VIDEO_ENCODER_SUPPORT_FLAG_RESOLUTION_RECONFIGURATION_AVAILABLE, it still only applies to resolution changes on a key frame.

The active sequence header must have the max_frame_*_minus_1 syntax set to the maximum resolution present in the associated ID3D12VideoEncoderHeap being used. Different frames using resolutions also present in the associated ID3D12VideoEncoderHeap can use the AV1 syntax frame_size_override_flag in frame_size() to convey change of resolution.

If D3D12_VIDEO_ENCODER_AV1_FRAME_TYPE_FLAG_SWITCH_FRAME is supported, the reference frames must point to higher or equal resolution than the current switch frame being encoded and the different resolutions must be all present in the associated ID3D12VideoEncoderHeap being used.

Similarly, if spatial scalability is supported, the different resolutions of the reference frames must be all present in the associated ID3D12VideoEncoderHeap being used.

Rate control notes

The accepted range for D3D12DDI_VIDEO_ENCODER_RATE_CONTROL_QVBR1_0096.ConstantQualityTarget is [0..63]. The lowest value yields the highest quality.

In general, D3D12DDI_VIDEO_ENCODER_SUPPORT_FLAG_0083_0_RATE_CONTROL_RECONFIGURATION_AVAILABLE applies to the quality versus speed tweaking and the following rate control parameters of the different rate control modes: QP in constant QP, bitrates and quality levels in CBR, VBR and QVBR. The driver can return D3D12DDI_VIDEO_ENCODER_ENCODE_ERROR_FLAG_0082_0_RECONFIGURATION_REQUEST_NOT_SUPPORTED in D3D12DDI_VIDEO_ENCODER_OUTPUT_METADATA_0083_0.EncodeErrorFlags for other unsupported rate control parameter reconfiguration.

Encoding operation API

The following structures and enumerations are added or updated with extensions to support the AV1 encoding operation:

In addition, a driver's existing PFND3D12DDI_VIDEO_ENCODE_RESOLVE_OUTPUT_METADATA_0082_0 callback needs to be updated to handle the AV1-specific resolved buffer layout added for AV1 encoding.