DML_ROI_ALIGN_GRAD_OPERATOR_DESC structure (directml.h)

Computes backpropagation gradients for ROI_ALIGN and ROI_ALIGN1.

Recall that DML_ROI_ALIGN1_OPERATOR_DESC crops and rescales subregions of an input tensor using either neareast-neighbor sampling or bilinear interpolation. Given an InputGradientTensor with the same sizes as the output of an equivalent DML_OPERATOR_ROI_ALIGN1, this operator produces an OutputGradientTensor with the same sizes as the input of DML_OPERATOR_ROI_ALIGN1.

As an example, consider a DML_OPERATOR_ROI_ALIGN1 that performs a nearest-neighbor scaling of 1.5x in the width, and 0.5x in the height, for 4 non-overlapping crops of an input with dimensions [1, 1, 4, 4]:

ROITensor
[[0, 0, 2, 2],
 [2, 0, 4, 2],
 [0, 2, 2, 4],
 [2, 2, 4, 4]]

BatchIndicesTensor
[0, 0, 0, 0]

InputTensor
[[[[1,   2, |  3,  4],    RoiAlign1     [[[[ 1,  1,  2]]],
   [5,   6, |  7,  8],       -->         [[[ 3,  3,  4]]],
   ------------------                    [[[ 9,  9, 10]]],
   [9,  10, | 11, 12],                   [[[11, 11, 12]]]]
   [13, 14, | 15, 16]]]]

Notice how the 0th element of each region contributes to two elements in the output—the 1st element contributes to one element in the output, and the 2nd and 3rd elements contribute to no elements of the output.

The corresponding DML_OPERATOR_ROI_ALIGN_GRAD would perform the following:

InputGradientTensor                  OutputGradientTensor
[[[[ 1,  2,  3]]],    ROIAlignGrad   [[[[ 3,  3, |  9,  6],
 [[[ 4,  5,  6]]],         -->          [ 0,  0, |  0,  0],
 [[[ 7,  8,  9]]],                      ------------------
 [[[10, 11, 12]]]]                      [15,  9, | 21, 12],
                                        [ 0,  0, |  0,  0]]]]

In summary, DML_OPERATOR_ROI_ALIGN_GRAD behaves similarly to a DML_OPERATOR_RESAMPLE_GRAD performed on each batch of the InputGradientTensor when regions don't overlap.

For OutputROIGradientTensor, the math is a little different, and can be summarized by the following pseudo code (assuming that MinimumSamplesPerOutput == 1 and MaximumSamplesPerOutput == 1):

for each region of interest (ROI):
    for each inputGradientCoordinate:
        for each inputCoordinate that contributed to this inputGradient element:
            topYIndex = floor(inputCoordinate.y)
            bottomYIndex = ceil(inputCoordinate.y)
            leftXIndex = floor(inputCoordinate.x)
            rightXIndex = ceil(inputCoordinate.x)

            yLerp = inputCoordinate.y - topYIndex
            xLerp = inputCoordinate.x - leftXIndex

            topLeft = InputTensor[topYIndex][leftXIndex]
            topRight = InputTensor[topYIndex][rightXIndex]
            bottomLeft = InputTensor[bottomYIndex][leftXIndex]
            bottomRight = InputTensor[bottomYIndex][rightXIndex]

            inputGradientWeight = InputGradientTensor[inputGradientCoordinate.y][inputGradientCoordinate.x]
            imageGradY = (1 - xLerp) * (bottomLeft - topLeft) + xLerp * (bottomRight - topRight)
            imageGradX = (1 - yLerp) * (topRight - topLeft) + yLerp * (bottomRight - bottomLeft)

            imageGradY *= inputGradientWeight
            imageGradX *= inputGradientWeight

            OutputROIGradientTensor[roiIndex][0] += imageGradX * (inputWidth - inputGradientCoordinate.x)
            OutputROIGradientTensor[roiIndex][1] += imageGradY * (inputHeight - inputGradientCoordinate.y)
            OutputROIGradientTensor[roiIndex][2] += imageGradX * inputGradientCoordinate.x
            OutputROIGradientTensor[roiIndex][3] += imageGradY * inputGradientCoordinate.y

OutputGradientTensor or OutputROIGradientTensor can be omitted if only one is needed; but at least one must be supplied.

Syntax

struct DML_ROI_ALIGN_GRAD_OPERATOR_DESC {
  const DML_TENSOR_DESC  *InputTensor;
  const DML_TENSOR_DESC  *InputGradientTensor;
  const DML_TENSOR_DESC  *ROITensor;
  const DML_TENSOR_DESC  *BatchIndicesTensor;
  const DML_TENSOR_DESC  *OutputGradientTensor;
  const DML_TENSOR_DESC  *OutputROIGradientTensor;
  DML_REDUCE_FUNCTION    ReductionFunction;
  DML_INTERPOLATION_MODE InterpolationMode;
  FLOAT                  SpatialScaleX;
  FLOAT                  SpatialScaleY;
  FLOAT                  InputPixelOffset;
  FLOAT                  OutputPixelOffset;
  UINT                   MinimumSamplesPerOutput;
  UINT                   MaximumSamplesPerOutput;
  BOOL                   AlignRegionsToCorners;
};

Members

InputTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

A tensor containing the input data from the forward pass with dimensions { BatchCount, ChannelCount, InputHeight, InputWidth }. This tensor must be supplied when OutputROIGradientTensor is supplied, or when ReductionFunction == DML_REDUCE_FUNCTION_MAX. This is the same tensor that would be supplied to InputTensor for DML_OPERATOR_ROI_ALIGN or DML_OPERATOR_ROI_ALIGN1.

InputGradientTensor

Type: const DML_TENSOR_DESC*

ROITensor

Type: const DML_TENSOR_DESC*

A tensor containing the regions of interest (ROI) data—a series of bounding boxes in floating point coordinates that point into the X and Y dimensions of the input tensor. The allowed dimensions of ROITensor are { NumROIs, 4 }, { 1, NumROIs, 4 }, or { 1, 1, NumROIs, 4 }. For each ROI, the values will be the coordinates of its top-left and bottom-right corners in the order [x1, y1, x2, y2]. Regions can be empty, meaning that all output pixels come from the single input coordinate, and regions can be inverted (for example, x2 less than x1), meaning that the output receives a mirrored/flipped version of the input. These coordinates are first scaled by SpatialScaleX and SpatialScaleY, but if they are both 1.0 then the region rectangles simply correspond directly to the input tensor coordinates. This is the same tensor that would be supplied to ROITensor for DML_OPERATOR_ROI_ALIGN or DML_OPERATOR_ROI_ALIGN1.

BatchIndicesTensor

Type: const DML_TENSOR_DESC*

A tensor containing the batch indices to extract the ROIs from. The allowed dimensions of BatchIndicesTensor are { NumROIs }, { 1, NumROIs }, { 1, 1, NumROIs }, or { 1, 1, 1, NumROIs }. Each value is the index of a batch from InputTensor. The behavior is undefined if the values are not in the range [0, BatchCount). This is the same tensor that would be supplied to BatchIndicesTensor for DML_OPERATOR_ROI_ALIGN or DML_OPERATOR_ROI_ALIGN1.

OutputGradientTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An output tensor containing the backpropagated gradients with respect to InputTensor. Typically this tensor would have the same sizes as the input of the corresponding DML_OPERATOR_ROI_ALIGN1 in the forward pass. If OutputROIGradientTensor is not supplied, then OutputGradientTensor must be supplied.

OutputROIGradientTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An output tensor containing the backpropagated gradients with respect to ROITensor. This tensor needs to have the same sizes as ROITensor. If OutputGradientTensor is not supplied, then OutputROIGradientTensor must be supplied.

ReductionFunction

Type: DML_REDUCE_FUNCTION

See DML_ROI_ALIGN1_OPERATOR_DESC::ReductionFunction.

InterpolationMode

Type: DML_INTERPOLATION_MODE

See DML_ROI_ALIGN1_OPERATOR_DESC::InterpolationMode.

SpatialScaleX

Type: FLOAT

See DML_ROI_ALIGN1_OPERATOR_DESC::SpatialScaleX.

SpatialScaleY

Type: FLOAT

See DML_ROI_ALIGN1_OPERATOR_DESC::SpatialScaleY.

InputPixelOffset

Type: FLOAT

See DML_ROI_ALIGN1_OPERATOR_DESC::InputPixelOffset.

OutputPixelOffset

Type: FLOAT

See DML_ROI_ALIGN1_OPERATOR_DESC::OutputPixelOffset.

MinimumSamplesPerOutput

Type: UINT

See DML_ROI_ALIGN1_OPERATOR_DESC::MinimumSamplesPerOutput.

MaximumSamplesPerOutput

Type: UINT

See DML_ROI_ALIGN1_OPERATOR_DESC::MaximumSamplesPerOutput.

AlignRegionsToCorners

Type: BOOL

See DML_ROI_ALIGN1_OPERATOR_DESC::AlignRegionsToCorners.

Remarks

Availability

This operator was introduced in DML_FEATURE_LEVEL_4_1.

Tensor constraints

InputGradientTensor, InputTensor, OutputGradientTensor, OutputROIGradientTensor, and ROITensor must have the same DataType.

Tensor support

DML_FEATURE_LEVEL_4_1 and above

Tensor Kind Supported dimension counts Supported data types
InputTensor Optional input 4 FLOAT32, FLOAT16
InputGradientTensor Input 4 FLOAT32, FLOAT16
ROITensor Input 2 to 4 FLOAT32, FLOAT16
BatchIndicesTensor Input 1 to 4 UINT32
OutputGradientTensor Optional output 4 FLOAT32, FLOAT16
OutputROIGradientTensor Optional output 2 to 4 FLOAT32, FLOAT16

Requirements

Requirement Value
Header directml.h