(l-onnx-doc-RoiAlign)=

# RoiAlign


(l-onnx-op-roialign-22)=

## RoiAlign - 22

### Version

- **name**: [RoiAlign (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#RoiAlign)
- **domain**: `main`
- **since_version**: `22`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 22**.

### Summary

Region of Interest (RoI) align operation described in the
[Mask R-CNN paper](https://arxiv.org/abs/1703.06870).
RoiAlign consumes an input tensor X and region of interests (rois)
to apply pooling across each RoI; it produces a 4-D tensor of shape
(num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing
quantizations while converting from original image into feature
map and from feature map into RoI feature; in each ROI bin,
the value of the sampled locations are computed directly
through bilinear interpolation.

### Attributes

* **coordinate_transformation_mode - STRING** (default is `'half_pixel'`):

  Allowed values are 'half_pixel' and 'output_half_pixel'. Use the value 'half_pixel' to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value 'output_half_pixel' to omit the pixel shift for the input (use this for a backward-compatible behavior).

* **mode - STRING** (default is `'avg'`):

  The pooling method. Two modes are supported: 'avg' and 'max'. Default is 'avg'.

* **output_height - INT** (default is `'1'`):

  default 1; Pooled output Y's height.

* **output_width - INT** (default is `'1'`):

  default 1; Pooled output Y's width.

* **sampling_ratio - INT** (default is `'0'`):

  Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0.

* **spatial_scale - FLOAT** (default is `'1.0'`):

  Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.

### Inputs

- **X** (heterogeneous) - **T1**:

  Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.
- **rois** (heterogeneous) - **T1**:

  RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs' coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the 'batch_indices' input.
- **batch_indices** (heterogeneous) - **T2**:

  1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

### Outputs

- **Y** (heterogeneous) - **T1**:

  RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

### Type Constraints

* **T1** in ( `tensor(bfloat16)`, `tensor(double)`, `tensor(float)`, `tensor(float16)` ):

  Constrain types to float tensors.
* **T2** in ( `tensor(int64)` ):

  Constrain types to int tensors.

```{toctree}
text_diff_RoiAlign_16_22
```

(l-onnx-op-roialign-16)=

## RoiAlign - 16

### Version

- **name**: [RoiAlign (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#RoiAlign)
- **domain**: `main`
- **since_version**: `16`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 16**.

### Summary

Region of Interest (RoI) align operation described in the
[Mask R-CNN paper](https://arxiv.org/abs/1703.06870).
RoiAlign consumes an input tensor X and region of interests (rois)
to apply pooling across each RoI; it produces a 4-D tensor of shape
(num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing
quantizations while converting from original image into feature
map and from feature map into RoI feature; in each ROI bin,
the value of the sampled locations are computed directly
through bilinear interpolation.

### Attributes

* **coordinate_transformation_mode - STRING** (default is `'half_pixel'`):

  Allowed values are 'half_pixel' and 'output_half_pixel'. Use the value 'half_pixel' to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value 'output_half_pixel' to omit the pixel shift for the input (use this for a backward-compatible behavior).

* **mode - STRING** (default is `'avg'`):

  The pooling method. Two modes are supported: 'avg' and 'max'. Default is 'avg'.

* **output_height - INT** (default is `'1'`):

  default 1; Pooled output Y's height.

* **output_width - INT** (default is `'1'`):

  default 1; Pooled output Y's width.

* **sampling_ratio - INT** (default is `'0'`):

  Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0.

* **spatial_scale - FLOAT** (default is `'1.0'`):

  Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.

### Inputs

- **X** (heterogeneous) - **T1**:

  Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.
- **rois** (heterogeneous) - **T1**:

  RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs' coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the 'batch_indices' input.
- **batch_indices** (heterogeneous) - **T2**:

  1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

### Outputs

- **Y** (heterogeneous) - **T1**:

  RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

### Type Constraints

* **T1** in ( `tensor(double)`, `tensor(float)`, `tensor(float16)` ):

  Constrain types to float tensors.
* **T2** in ( `tensor(int64)` ):

  Constrain types to int tensors.

```{toctree}
text_diff_RoiAlign_10_22
text_diff_RoiAlign_10_16
```

(l-onnx-op-roialign-10)=

## RoiAlign - 10

### Version

- **name**: [RoiAlign (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#RoiAlign)
- **domain**: `main`
- **since_version**: `10`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 10**.

### Summary

Region of Interest (RoI) align operation described in the
[Mask R-CNN paper](https://arxiv.org/abs/1703.06870).
RoiAlign consumes an input tensor X and region of interests (rois)
to apply pooling across each RoI; it produces a 4-D tensor of shape
(num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing
quantizations while converting from original image into feature
map and from feature map into RoI feature; in each ROI bin,
the value of the sampled locations are computed directly
through bilinear interpolation.

### Attributes

* **mode - STRING** (default is `'avg'`):

  The pooling method. Two modes are supported: 'avg' and 'max'. Default is 'avg'.

* **output_height - INT** (default is `'1'`):

  default 1; Pooled output Y's height.

* **output_width - INT** (default is `'1'`):

  default 1; Pooled output Y's width.

* **sampling_ratio - INT** (default is `'0'`):

  Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0.

* **spatial_scale - FLOAT** (default is `'1.0'`):

  Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.

### Inputs

- **X** (heterogeneous) - **T1**:

  Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.
- **rois** (heterogeneous) - **T1**:

  RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs' coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the 'batch_indices' input.
- **batch_indices** (heterogeneous) - **T2**:

  1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

### Outputs

- **Y** (heterogeneous) - **T1**:

  RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

### Type Constraints

* **T1** in ( `tensor(double)`, `tensor(float)`, `tensor(float16)` ):

  Constrain types to float tensors.
* **T2** in ( `tensor(int64)` ):

  Constrain types to int tensors.