RoiAlign - 10 vs 22

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show
  1. RoiAlign10 → RoiAlign22 +5 -1
RoiAlign10 → RoiAlign22 RENAMED
@@ -1 +1 @@
1
1
  Region of Interest (RoI) align operation described in the
2
2
  [Mask R-CNN paper](https://arxiv.org/abs/1703.06870).
3
3
  RoiAlign consumes an input tensor X and region of interests (rois)
4
4
  to apply pooling across each RoI; it produces a 4-D tensor of shape
5
5
  (num_rois, C, output_height, output_width).
6
6
  RoiAlign is proposed to avoid the misalignment by removing
7
7
  quantizations while converting from original image into feature
8
8
  map and from feature map into RoI feature; in each ROI bin,
9
9
  the value of the sampled locations are computed directly
10
10
  through bilinear interpolation.
11
11
  ### Attributes
12
+
13
+ * **coordinate_transformation_mode - STRING** (default is 'half_pixel'):
14
+
15
+ Allowed values are 'half_pixel' and 'output_half_pixel'. Use the value 'half_pixel' to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value 'output_half_pixel' to omit the pixel shift for the input (use this for a backward-compatible behavior).
12
16
  * **mode - STRING** (default is 'avg'):
13
17
  The pooling method. Two modes are supported: 'avg' and 'max'. Default is 'avg'.
14
18
  * **output_height - INT** (default is '1'):
15
19
  default 1; Pooled output Y's height.
16
20
  * **output_width - INT** (default is '1'):
17
21
  default 1; Pooled output Y's width.
18
22
  * **sampling_ratio - INT** (default is '0'):
19
23
  Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0.
20
24
  * **spatial_scale - FLOAT** (default is '1.0'):
21
25
  Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.
22
26
  ### Inputs
23
27
  - **X** (heterogeneous) - **T1**:
24
28
  Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.
25
29
  - **rois** (heterogeneous) - **T1**:
26
30
  RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs' coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the 'batch_indices' input.
27
31
  - **batch_indices** (heterogeneous) - **T2**:
28
32
  1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.
29
33
  ### Outputs
30
34
  - **Y** (heterogeneous) - **T1**:
31
35
  RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].
32
36
  ### Type Constraints
33
- * **T1** in ( tensor(double), tensor(float), tensor(float16) ):
37
+ * **T1** in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):
34
38
  Constrain types to float tensors.
35
39
  * **T2** in ( tensor(int64) ):
36
40
  Constrain types to int tensors.