QLinearMatMul - 10 vs 21¶

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show

QLinearMatMul10 → QLinearMatMul21 +11 -9

QLinearMatMul10 → QLinearMatMul21 RENAMED Viewed

@@ -1 +1 @@
  Matrix product that behaves like [numpy.matmul](https://numpy.org/doc/stable/reference/generated/numpy.matmul.html).
  It consumes two quantized input tensors, their scales and zero points, scale and zero point of output,
  and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point).
  For (x / y_scale), it is rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
  Scale and zero point must have same shape. They must be either scalar (per tensor) or N-D tensor
  (per row for 'a' and per column for 'b'). Scalar refers to per tensor quantization whereas N-D refers to per row
  or per column quantization. If the input is 2D of shape [M, K] then zero point and scale tensor may be
  an M element vector [v_1, v_2, ..., v_M] for per row quantization and K element vector of shape [v_1, v_2, ..., v_K]
  for per column quantization. If the input is N-D tensor with shape [D1, D2, M, K] then zero point and scale tensor may
  have shape [D1, D2, M, 1] for per row quantization and shape [D1, D2, 1, K] for per column quantization.
  Production must never overflow, and accumulation may overflow if and only if in 32 bits.
  ### Inputs
  - **a** (heterogeneous) - **T1**:
    N-dimensional quantized matrix a
- - **a_scale** (heterogeneous) - **tensor(float)**:
+ - **a_scale** (heterogeneous) - **TS**:
    scale of quantized input a
  - **a_zero_point** (heterogeneous) - **T1**:
    zero point of quantized input a
  - **b** (heterogeneous) - **T2**:
    N-dimensional quantized matrix b
- - **b_scale** (heterogeneous) - **tensor(float)**:
+ - **b_scale** (heterogeneous) - **TS**:
    scale of quantized input b
  - **b_zero_point** (heterogeneous) - **T2**:
    zero point of quantized input b
- - **y_scale** (heterogeneous) - **tensor(float)**:
+ - **y_scale** (heterogeneous) - **TS**:
    scale of quantized output y
  - **y_zero_point** (heterogeneous) - **T3**:
    zero point of quantized output y
  ### Outputs
  - **y** (heterogeneous) - **T3**:
    Quantized matrix multiply results from a * b
  ### Type Constraints
- * **T1** in ( tensor(int8), tensor(uint8) ):
+ * **TS** in ( tensor(bfloat16), tensor(float), tensor(float16) ):
-   Constrain input a and its zero point data type to 8-bit integer tensor.
+   Constrain scales.
- * **T2** in ( tensor(int8), tensor(uint8) ):
+ * **T1** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8) ):
-   Constrain input b and its zero point data type to 8-bit integer tensor.
+   The type of input a and its zeropoint.
- * **T3** in ( tensor(int8), tensor(uint8) ):
+ * **T2** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8) ):
-   Constrain output y and its zero point data type to 8-bit integer tensor.+   The type of input b and its zeropoint.
+ * **T3** in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8) ):
+   The type of the output and its zeropoint.