DML_QUANTIZED_LINEAR_MATRIX_MULTIPLY_OPERATOR_DESC structure (directml.h)

Performs a matrix multiplication function on quantized data. This operator is mathematically equivalent to dequantizing the inputs, then performing matrix multiply, and then quantizing the output.

This operator requires the matrix multiply input tensors to be 4D which are formatted as { BatchCount, ChannelCount, Height, Width }. The matrix multiply operator will perform BatchCount * ChannelCount number of independent matrix multiplications.

For example, if ATensor has Sizes of { BatchCount, ChannelCount, M, K }, and BTensor has Sizes of { BatchCount, ChannelCount, K, N }, and OutputTensor has Sizes of { BatchCount, ChannelCount, M, N }, then the matrix multiply operator will perform BatchCount * ChannelCount independent matrix multiplications of dimensions {M,K} x {K,N} = {M,N}.

Dequantize function

f(Input, Scale, ZeroPoint) = (Input - ZeroPoint) * Scale

Quantize function

f(Input, Scale, ZeroPoint) = clamp(round(Input / Scale) + ZeroPoint, Min, Max)

Syntax

struct DML_QUANTIZED_LINEAR_MATRIX_MULTIPLY_OPERATOR_DESC {
  const DML_TENSOR_DESC *ATensor;
  const DML_TENSOR_DESC *AScaleTensor;
  const DML_TENSOR_DESC *AZeroPointTensor;
  const DML_TENSOR_DESC *BTensor;
  const DML_TENSOR_DESC *BScaleTensor;
  const DML_TENSOR_DESC *BZeroPointTensor;
  const DML_TENSOR_DESC *OutputScaleTensor;
  const DML_TENSOR_DESC *OutputZeroPointTensor;
  const DML_TENSOR_DESC *OutputTensor;
};

Members

ATensor

Type: const DML_TENSOR_DESC*

A tensor containing the A data. This tensor's dimensions should be { BatchCount, ChannelCount, M, K }.

AScaleTensor

Type: const DML_TENSOR_DESC*

A tensor containing the ATensor scale data. The expected dimensions of the AScaleTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, M, 1 } if per row quantization is required. These scale values are used for dequantizing the A values.

AZeroPointTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An optional tensor containing the ATensor zero point data. The expected dimensions of the AZeroPointTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, M, 1 } if per row quantization is required. These zero point values are used for dequantizing the ATensor values.

BTensor

Type: const DML_TENSOR_DESC*

A tensor containing the B data. This tensor's dimensions should be { BatchCount, ChannelCount, K, N }.

BScaleTensor

Type: const DML_TENSOR_DESC*

A tensor containing the BTensor scale data. The expected dimensions of the BScaleTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, 1, N } if per column quantization is required. These scale values are used for dequantizing the BTensor values.

BZeroPointTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An optional tensor containing the BTensor zero point data. The expected dimensions of the BZeroPointTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, 1, N } if per column quantization is required. These zero point values are used for dequantizing the BTensor values.

OutputScaleTensor

Type: const DML_TENSOR_DESC*

A tensor containing the OutputTensor scale data. The expected dimensions of the OutputScaleTensor are { 1, 1, 1, 1 } if per-tensor quantization is required, or { 1, 1, M, 1 } if per-row quantization is required. This scale value is used for dequantizing the OutputTensor values.

OutputZeroPointTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An optional tensor containing the OutputTensor zero point data. The expected dimensions of the OutputZeroPointTensor are { 1, 1, 1, 1 } if per-tensor quantization is required, or { 1, 1, M, 1 } if per-row quantization is required. This zero point value is used for dequantizing the OutputTensor values.

OutputTensor

Type: const DML_TENSOR_DESC*

A tensor to write the results to. This tensor's dimensions are { BatchCount, ChannelCount, M, N }.

Availability

This operator was introduced in DML_FEATURE_LEVEL_2_1.

Tensor constraints

  • AScaleTensor, AZeroPointTensor, BScaleTensor, BZeroPointTensor, OutputScaleTensor, and OutputZeroPointTensor must have the same DimensionCount.
  • ATensor, BTensor, and OutputTensor must have the same DimensionCount.
  • BTensor and BZeroPointTensor must have the same DataType.
  • OutputTensor and OutputZeroPointTensor must have the same DataType.
  • AScaleTensor, AZeroPointTensor, BScaleTensor, BZeroPointTensor, OutputScaleTensor, and OutputZeroPointTensor must have the same DimensionCount.
  • ATensor and AZeroPointTensor must have the same DataType.

Tensor support

DML_FEATURE_LEVEL_4_0 and above

Tensor Kind Supported dimension counts Supported data types
ATensor Input 2 to 4 INT8, UINT8
AScaleTensor Input 1 to 4 FLOAT32
AZeroPointTensor Optional input 1 to 4 INT8, UINT8
BTensor Input 2 to 4 INT8, UINT8
BScaleTensor Input 1 to 4 FLOAT32
BZeroPointTensor Optional input 1 to 4 INT8, UINT8
OutputScaleTensor Input 1 to 4 FLOAT32
OutputZeroPointTensor Optional input 1 to 4 INT8, UINT8
OutputTensor Output 2 to 4 INT8, UINT8

DML_FEATURE_LEVEL_2_1 and above

Tensor Kind Supported dimension counts Supported data types
ATensor Input 4 INT8, UINT8
AScaleTensor Input 4 FLOAT32
AZeroPointTensor Optional input 4 INT8, UINT8
BTensor Input 4 INT8, UINT8
BScaleTensor Input 4 FLOAT32
BZeroPointTensor Optional input 4 INT8, UINT8
OutputScaleTensor Input 4 FLOAT32
OutputZeroPointTensor Optional input 4 INT8, UINT8
OutputTensor Output 4 INT8, UINT8

Requirements

Requirement Value
Minimum supported client Windows 10 Build 20348
Minimum supported server Windows 10 Build 20348
Header directml.h