(onnx-detail-int2) =

2 bit integer types

Papers

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

T-MAC, an innovative lookup table(LUT)-based method designed for efficient low-bit LLM (i.e., weight-quantized LLM) inference on CPUs. T-MAC directly supports mpGEMM without dequantization, while simultaneously eliminating multiplications and reducing additions required. Specifically, T-MAC transforms the traditional data-type-centric multiplication to bit-wise table lookup, and enables a unified and scalable mpGEMM solution.

Cast

Cast from 2 bit to any higher precision type is exact. Cast to a 2 bit type is done by rounding to the nearest-integer (with ties to even) nearest-even integer and truncating.

Packing and Unpacking (2-bit)

All 2-bit types are stored as 4×2-bit values in a single byte. The elements are packed from least significant bits (LSB) to most significant bits (MSB). That is, for consecutive elements x0, x1, x2, x3 in the array:

Packing:

pack(x0, x1, x2, x3):
    (x0 & 0x03) |
    ((x1 & 0x03) << 2) |
    ((x2 & 0x03) << 4) |
    ((x3 & 0x03) << 6)

Unpacking:

x0 = z & 0x03
x1 = (z >> 2) & 0x03
x2 = (z >> 4) & 0x03
x3 = (z >> 6) & 0x03

In case the total number of elements is not divisible by 4, zero-padding will be applied in the remaining higher bits of the final byte. The storage size of a 2-bit tensor of size N is: ceil(N / 4) bytes