QLinearMatMul - 10 vs 21

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show
  1. QLinearMatMul10 → QLinearMatMul21 +11 -9
QLinearMatMul10 → QLinearMatMul21 RENAMED
@@ -1 +1 @@
1
1
  Matrix product that behaves like [numpy.matmul](https://numpy.org/doc/stable/reference/generated/numpy.matmul.html).
2
2
  It consumes two quantized input tensors, their scales and zero points, scale and zero point of output,
3
3
  and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point).
4
4
  For (x / y_scale), it is rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
5
5
  Scale and zero point must have same shape. They must be either scalar (per tensor) or N-D tensor
6
6
  (per row for 'a' and per column for 'b'). Scalar refers to per tensor quantization whereas N-D refers to per row
7
7
  or per column quantization. If the input is 2D of shape [M, K] then zero point and scale tensor may be
8
8
  an M element vector [v_1, v_2, ..., v_M] for per row quantization and K element vector of shape [v_1, v_2, ..., v_K]
9
9
  for per column quantization. If the input is N-D tensor with shape [D1, D2, M, K] then zero point and scale tensor may
10
10
  have shape [D1, D2, M, 1] for per row quantization and shape [D1, D2, 1, K] for per column quantization.
11
11
  Production must never overflow, and accumulation may overflow if and only if in 32 bits.
12
12
  ### Inputs
13
13
  - **a** (heterogeneous) - **T1**:
14
14
  N-dimensional quantized matrix a
15
- - **a_scale** (heterogeneous) - **tensor(float)**:
15
+ - **a_scale** (heterogeneous) - **TS**:
16
16
  scale of quantized input a
17
17
  - **a_zero_point** (heterogeneous) - **T1**:
18
18
  zero point of quantized input a
19
19
  - **b** (heterogeneous) - **T2**:
20
20
  N-dimensional quantized matrix b
21
- - **b_scale** (heterogeneous) - **tensor(float)**:
21
+ - **b_scale** (heterogeneous) - **TS**:
22
22
  scale of quantized input b
23
23
  - **b_zero_point** (heterogeneous) - **T2**:
24
24
  zero point of quantized input b
25
- - **y_scale** (heterogeneous) - **tensor(float)**:
25
+ - **y_scale** (heterogeneous) - **TS**:
26
26
  scale of quantized output y
27
27
  - **y_zero_point** (heterogeneous) - **T3**:
28
28
  zero point of quantized output y
29
29
  ### Outputs
30
30
  - **y** (heterogeneous) - **T3**:
31
31
  Quantized matrix multiply results from a * b
32
32
  ### Type Constraints
33
- * **T1** in ( tensor(int8), tensor(uint8) ):
33
+ * **TS** in ( tensor(bfloat16), tensor(float), tensor(float16) ):
34
- Constrain input a and its zero point data type to 8-bit integer tensor.
34
+ Constrain scales.
35
- * **T2** in ( tensor(int8), tensor(uint8) ):
35
+ * **T1** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8) ):
36
- Constrain input b and its zero point data type to 8-bit integer tensor.
36
+ The type of input a and its zeropoint.
37
- * **T3** in ( tensor(int8), tensor(uint8) ):
37
+ * **T2** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8) ):
38
- Constrain output y and its zero point data type to 8-bit integer tensor.+ The type of input b and its zeropoint.
38
+ * **T3** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8) ):
39
+
40
+ The type of the output and its zeropoint.