QuantizeLinear - 10 vs 21¶
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
QuantizeLinear10 → QuantizeLinear21
RENAMED
@@ -1 +1 @@
|
|
1
|
-
The linear
|
1
|
+
The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
|
2
|
+
low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
|
2
|
-
The quantization formula is y = saturate
|
3
|
+
granularity. The quantization formula is y = saturate((x / y_scale) + y_zero_point).
|
4
|
+
Saturation is done according to:
|
5
|
+
- uint16: [0, 65535]
|
6
|
+
- int16: [-32768, 32767]
|
7
|
+
- uint8: [0, 255]
|
8
|
+
- int8: [-128, 127]
|
9
|
+
- uint4: [0, 15]
|
10
|
+
- int4: [-8, 7]
|
3
|
-
For (x / y_scale), it
|
11
|
+
For (x / y_scale), it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
|
12
|
+
y_zero_point and y must have the same type. y_zero_point is usually not used for quantization to float8 types, but the quantization
|
13
|
+
formula remains the same for consistency, and the type of the attribute y_zero_point still determines the quantization type.
|
14
|
+
There are three supported quantization granularities, determined by the shape of y_scale.
|
15
|
+
In all cases, y_zero_point must have the same shape as y_scale.
|
16
|
+
- Per-tensor (per-layer) quantization: y_scale is a scalar.
|
17
|
+
- Per-axis quantization: The scale must be a 1-D tensor, with the length of the quantization axis. For an input shape
|
18
|
+
(D0, ..., Di, ..., Dn) and axis=i, y_scale is a 1-D tensor of length Di.
|
19
|
+
- Blocked quantization: The scale's shape is identical to the input's shape, except for one dimension, in which
|
20
|
+
blocking is performed. Given x shape (D0, ..., Di, ..., Dn), axis=i, and block size B: y_scale shape is
|
21
|
+
(D0, ..., ceil(Di/B), ..., Dn).
|
22
|
+
|
23
|
+
### Attributes
|
24
|
+
|
25
|
+
* **axis - INT** (default is '1'):
|
26
|
+
|
27
|
+
(Optional) The axis of the dequantizing dimension of the input tensor. Used only for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). When the rank of the input is 1, per-tensor quantization is applied, rendering the axis unnecessary in this scenario.
|
28
|
+
|
29
|
+
* **block_size - INT** (default is '0'):
|
30
|
+
|
31
|
+
(Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given x shape (D0, ..., Di, ..., Dn), y_scale shape (S0, ... Si, ...Sn) and axis=i, the accepted range is [ceil(Di/Si), ceil(Di/(Si-1))-1]
|
32
|
+
|
33
|
+
* **output_dtype - INT** (default is '0'):
|
34
|
+
|
35
|
+
(Optional) The output data type. If not supplied, the output data type is inferred from y_zero_point data type (T2). If neither output_dtype nor y_zero_point are supplied, output data type is uint8. If both output_dtype and y_zero_point are specified, output_dtype must be T2.
|
36
|
+
|
37
|
+
* **saturate - INT** (default is '1'):
|
38
|
+
|
39
|
+
The parameter defines how the conversion behaves if an input value is out of range of the destination type. It only applies for float 8 quantization (float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz). It is true by default. All cases are fully described in two tables inserted in the operator description.
|
4
40
|
### Inputs
|
5
41
|
Between 2 and 3 inputs.
|
6
42
|
- **x** (heterogeneous) - **T1**:
|
7
43
|
N-D full precision Input tensor to be quantized.
|
8
|
-
- **y_scale** (heterogeneous) - **
|
44
|
+
- **y_scale** (heterogeneous) - **T1**:
|
9
|
-
Scale for doing quantization to get
|
45
|
+
Scale for doing quantization to get y. For per-tensor/layer quantization the scale is a scalar, for per-axis quantization it is a 1-D Tensor and for blocked quantization it has the same shape as the input, except for one dimension in which blocking is performed.
|
10
46
|
- **y_zero_point** (optional, heterogeneous) - **T2**:
|
11
|
-
Zero point for doing quantization to get
|
47
|
+
Zero point for doing quantization to get y. Shape must match y_scale.Default is uint8 with zero point of 0 if it's not specified.
|
12
48
|
### Outputs
|
13
49
|
- **y** (heterogeneous) - **T2**:
|
14
|
-
N-D quantized output tensor. It has same shape as input
|
50
|
+
N-D quantized output tensor. It has same shape as input x.
|
15
51
|
### Type Constraints
|
16
|
-
* **T1** in ( tensor(float), tensor(int32) ):
|
52
|
+
* **T1** in ( tensor(bfloat16), tensor(float), tensor(float16), tensor(int32) ):
|
17
|
-
Constrain 'x' to float or int32 tensor.
|
18
|
-
|
53
|
+
The type of the input 'x'.
|
54
|
+
* **T2** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8) ):
|
19
|
-
Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.+ The type of the input y_zero_point and the output y.
|