MatMulInteger

MatMulInteger - 10

Version

  • name: MatMulInteger (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Matrix product that behaves like numpy.matmul. The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Inputs

Between 2 and 4 inputs.

  • A (heterogeneous) - T1:

    N-dimensional matrix A

  • B (heterogeneous) - T2:

    N-dimensional matrix B

  • a_zero_point (optional, heterogeneous) - T1:

    Zero point tensor for input ‘A’. It’s optional and default value is 0. It could be a scalar or N-D tensor. Scalar refers to per tensor quantization whereas N-D refers to per row quantization. If the input is 2D of shape [M, K] then zero point tensor may be an M element vector [zp_1, zp_2, …, zp_M]. If the input is N-D tensor with shape [D1, D2, M, K] then zero point tensor may have shape [D1, D2, M, 1].

  • b_zero_point (optional, heterogeneous) - T2:

    Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a N-D tensor, Scalar refers to per tensor quantization whereas N-D refers to per col quantization. If the input is 2D of shape [K, N] then zero point tensor may be an N element vector [zp_1, zp_2, …, zp_N]. If the input is N-D tensor with shape [D1, D2, K, N] then zero point tensor may have shape [D1, D2, 1, N].

Outputs

  • Y (heterogeneous) - T3:

    Matrix multiply results from A * B

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ):

    Constrain input A data type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ):

    Constrain input B data type to 8-bit integer tensor.

  • T3 in ( tensor(int32) ):

    Constrain output Y data type as 32-bit integer tensor.