DequantizeLinear - 13 vs 23

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

DequantizeLinear13 → DequantizeLinear23 RENAMED
@@ -1 +1 @@
1
- The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor.
1
+ The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the
2
- The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point must have same shape, and can be either a scalar
2
+ full-precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point
3
+ must have the same shape, determining the quantization's granularity: a scalar for per-tensor/per-layer quantization,
3
- for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
4
+ a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization.
5
+ See QuantizeLinear for details on quantization granularity.
6
+
4
- x_zero_point and x must have same type. x and y must have same shape. In the case of dequantizing int32,
7
+ x_zero_point and x must have the same type. x and y must have the same shape. In the case of dequantizing
5
- there's no zero point (zero point is supposed to be 0).
8
+ int32, there's no zero point (zero point is supposed to be 0).
9
+ zero-point is usually not used in the case of float8 and 4-bit types quantization, but the dequantization formula remains the same
10
+ for consistency. The output type is determined by the attribute output_dtype. If output_dtype is not supplied then the output type
11
+ is the same as x_scale. The output type also determines the precision of the multiplication operation.
6
12
  ### Attributes
7
13
  * **axis - INT** (default is '1'):
8
- (Optional) The axis of the dequantizing dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).
14
+ (Optional) The axis of the dequantizing dimension of the input tensor. Used for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).
15
+
16
+ * **block_size - INT** (default is '0'):
17
+
18
+ (Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given x shape (D0, ..., Di, ..., Dn), y_scale shape (S0, ... Si, ...Sn) and axis=i, the accepted range is [ceil(Di/Si), ceil(Di/(Si-1))-1]
19
+
20
+ * **output_dtype - INT** (default is '0'):
21
+
22
+ (Optional) The output data type. If not supplied, the output data type is inferred from x_scale data type (T2)
9
23
  ### Inputs
10
24
  Between 2 and 3 inputs.
11
- - **x** (heterogeneous) - **T**:
25
+ - **x** (heterogeneous) - **T1**:
12
26
  N-D quantized input tensor to be de-quantized.
13
- - **x_scale** (heterogeneous) - **tensor(float)**:
27
+ - **x_scale** (heterogeneous) - **T2**:
14
- Scale for input 'x'. It can be a scalar, which means a per-tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.
28
+ Scale for input x. For per-tensor/layer dequantization the scale is a scalar, for per per-axis dequantization it is a 1-D Tensor and for blocked dequantization it has the same shape as the input, except for one dimension in which blocking is performed.
15
- - **x_zero_point** (optional, heterogeneous) - **T**:
29
+ - **x_zero_point** (optional, heterogeneous) - **T1**:
16
- Zero point for input 'x'. Shape must match x_scale. It's optional. Zero point is 0 when it's not specified.
30
+ Zero point for input x. Shape must match x_scale. It's optional. Zero point is 0 when it's not specified.
17
31
  ### Outputs
18
- - **y** (heterogeneous) - **tensor(float)**:
32
+ - **y** (heterogeneous) - **T3**:
19
- N-D full precision output tensor. It has same shape as input 'x'.
33
+ N-D full precision output tensor. It has the same shape as input x. The data type is specified by the output_dtype attribute or, in its absence, the type of x_scale.
20
34
  ### Type Constraints
21
- * **T** in ( tensor(int32), tensor(int8), tensor(uint8) ):
35
+ * **T1** in ( tensor(float4e2m1), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8) ):
22
- Constrain 'x_zero_point' and 'x' to 8-bit/32-bit integer tensor.+ The type of the inputs 'x_zero_point' and 'x'.
36
+ * **T2** in ( tensor(bfloat16), tensor(float), tensor(float16) ):
37
+
38
+ The type of the input 'x_scale'.
39
+ * **T3** in ( tensor(bfloat16), tensor(float), tensor(float16) ):
40
+
41
+ The type of the output 'y'.