DequantizeLinear - 19 vs 23

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

DequantizeLinear19 → DequantizeLinear23 RENAMED
@@ -1 +1 @@
1
- The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor.
1
+ The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the
2
- The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point must have same shape, and can be either a scalar
2
+ full-precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point
3
+ must have the same shape, determining the quantization's granularity: a scalar for per-tensor/per-layer quantization,
3
- for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
4
+ a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization.
5
+ See QuantizeLinear for details on quantization granularity.
6
+
4
- x_zero_point and x must have same type. x and y must have same shape. In the case of dequantizing int32,
7
+ x_zero_point and x must have the same type. x and y must have the same shape. In the case of dequantizing
5
- there's no zero point (zero point is supposed to be 0).
8
+ int32, there's no zero point (zero point is supposed to be 0).
6
- zero-point is usually not used in the case of float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz quantization,
9
+ zero-point is usually not used in the case of float8 and 4-bit types quantization, but the dequantization formula remains the same
7
- but the dequantization formula remains the same for consistency and 'x_scale' still determines the output type.
10
+ for consistency. The output type is determined by the attribute output_dtype. If output_dtype is not supplied then the output type
11
+ is the same as x_scale. The output type also determines the precision of the multiplication operation.
8
12
  ### Attributes
9
13
  * **axis - INT** (default is '1'):
10
- (Optional) The axis of the dequantizing dimension of the input tensor. Used only for per-axis quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). When the rank of the input is 1, per-tensor quantization is applied, rendering the axis unnecessary in this scenario.
14
+ (Optional) The axis of the dequantizing dimension of the input tensor. Used for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).
15
+
16
+ * **block_size - INT** (default is '0'):
17
+
18
+ (Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given x shape (D0, ..., Di, ..., Dn), y_scale shape (S0, ... Si, ...Sn) and axis=i, the accepted range is [ceil(Di/Si), ceil(Di/(Si-1))-1]
19
+
20
+ * **output_dtype - INT** (default is '0'):
21
+
22
+ (Optional) The output data type. If not supplied, the output data type is inferred from x_scale data type (T2)
11
23
  ### Inputs
12
24
  Between 2 and 3 inputs.
13
25
  - **x** (heterogeneous) - **T1**:
14
26
  N-D quantized input tensor to be de-quantized.
15
27
  - **x_scale** (heterogeneous) - **T2**:
16
- Scale for input 'x'. It can be a scalar, which means a per-tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.
28
+ Scale for input x. For per-tensor/layer dequantization the scale is a scalar, for per per-axis dequantization it is a 1-D Tensor and for blocked dequantization it has the same shape as the input, except for one dimension in which blocking is performed.
17
29
  - **x_zero_point** (optional, heterogeneous) - **T1**:
18
- Zero point for input 'x'. Shape must match x_scale. It's optional. Zero point is 0 when it's not specified.
30
+ Zero point for input x. Shape must match x_scale. It's optional. Zero point is 0 when it's not specified.
19
31
  ### Outputs
20
- - **y** (heterogeneous) - **T2**:
32
+ - **y** (heterogeneous) - **T3**:
21
- N-D full precision output tensor. It has same shape as input 'x'.
33
+ N-D full precision output tensor. It has the same shape as input x. The data type is specified by the output_dtype attribute or, in its absence, the type of x_scale.
22
34
  ### Type Constraints
23
- * **T1** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int32), tensor(int8), tensor(uint8) ):
35
+ * **T1** in ( tensor(float4e2m1), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8) ):
24
- Constrain 'x_zero_point' and 'x' to 8-bit integer or float, or /32-bit integer tensor.
36
+ The type of the inputs 'x_zero_point' and 'x'.
25
37
  * **T2** in ( tensor(bfloat16), tensor(float), tensor(float16) ):
38
+ * **T3** in ( tensor(bfloat16), tensor(float), tensor(float16) ):
39
+
26
- 'x_scale' determines the output type.+ The type of the input 'x_scale'.
40
+ The type of the output 'y'.