RoiAlign

RoiAlign - 22

Version

  • name: RoiAlign (GitHub)

  • domain: main

  • since_version: 22

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 22.

Summary

Region of Interest (RoI) align operation described in the Mask R-CNN paper. RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

  • coordinate_transformation_mode - STRING (default is 'half_pixel'):

    Allowed values are ‘half_pixel’ and ‘output_half_pixel’. Use the value ‘half_pixel’ to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value ‘output_half_pixel’ to omit the pixel shift for the input (use this for a backward-compatible behavior).

  • mode - STRING (default is 'avg'):

    The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’.

  • output_height - INT (default is '1'):

    default 1; Pooled output Y’s height.

  • output_width - INT (default is '1'):

    default 1; Pooled output Y’s width.

  • sampling_ratio - INT (default is '0'):

    Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0.

  • spatial_scale - FLOAT (default is '1.0'):

    Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.

Inputs

  • X (heterogeneous) - T1:

    Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1:

    RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2:

    1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

  • Y (heterogeneous) - T1:

    RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):

    Constrain types to float tensors.

  • T2 in ( tensor(int64) ):

    Constrain types to int tensors.

RoiAlign - 16

Version

  • name: RoiAlign (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Region of Interest (RoI) align operation described in the Mask R-CNN paper. RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

  • coordinate_transformation_mode - STRING (default is 'half_pixel'):

    Allowed values are ‘half_pixel’ and ‘output_half_pixel’. Use the value ‘half_pixel’ to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value ‘output_half_pixel’ to omit the pixel shift for the input (use this for a backward-compatible behavior).

  • mode - STRING (default is 'avg'):

    The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’.

  • output_height - INT (default is '1'):

    default 1; Pooled output Y’s height.

  • output_width - INT (default is '1'):

    default 1; Pooled output Y’s width.

  • sampling_ratio - INT (default is '0'):

    Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0.

  • spatial_scale - FLOAT (default is '1.0'):

    Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.

Inputs

  • X (heterogeneous) - T1:

    Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1:

    RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2:

    1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

  • Y (heterogeneous) - T1:

    RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ):

    Constrain types to float tensors.

  • T2 in ( tensor(int64) ):

    Constrain types to int tensors.

RoiAlign - 10

Version

  • name: RoiAlign (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Region of Interest (RoI) align operation described in the Mask R-CNN paper. RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

  • mode - STRING (default is 'avg'):

    The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’.

  • output_height - INT (default is '1'):

    default 1; Pooled output Y’s height.

  • output_width - INT (default is '1'):

    default 1; Pooled output Y’s width.

  • sampling_ratio - INT (default is '0'):

    Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0.

  • spatial_scale - FLOAT (default is '1.0'):

    Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.

Inputs

  • X (heterogeneous) - T1:

    Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1:

    RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2:

    1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

  • Y (heterogeneous) - T1:

    RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ):

    Constrain types to float tensors.

  • T2 in ( tensor(int64) ):

    Constrain types to int tensors.