MatMulInteger¶

MatMulInteger - 10¶

This version of the operator has been available since version 10.

Matrix product that behaves like numpy.matmul. The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Between 2 and 4 inputs.

A (heterogeneous) - T1:

N-dimensional matrix A
B (heterogeneous) - T2:

N-dimensional matrix B
a_zero_point (optional, heterogeneous) - T1:

Zero point tensor for input ‘A’. It’s optional and default value is 0. It could be a scalar or N-D tensor. Scalar refers to per tensor quantization whereas N-D refers to per row quantization. If the input is 2D of shape [M, K] then zero point tensor may be an M element vector [zp_1, zp_2, …, zp_M]. If the input is N-D tensor with shape [D1, D2, M, K] then zero point tensor may have shape [D1, D2, M, 1].
b_zero_point (optional, heterogeneous) - T2:

Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a N-D tensor, Scalar refers to per tensor quantization whereas N-D refers to per col quantization. If the input is 2D of shape [K, N] then zero point tensor may be an N element vector [zp_1, zp_2, …, zp_N]. If the input is N-D tensor with shape [D1, D2, K, N] then zero point tensor may have shape [D1, D2, 1, N].

T1 in ( tensor(int8), tensor(uint8) ):

Constrain input A data type to 8-bit integer tensor.
T2 in ( tensor(int8), tensor(uint8) ):

Constrain input B data type to 8-bit integer tensor.
T3 in ( tensor(int32) ):

Constrain output Y data type as 32-bit integer tensor.