(l-onnx-doc-QuantizeLinear)=

# QuantizeLinear


(l-onnx-op-quantizelinear-23)=

## QuantizeLinear - 23

### Version

- **name**: [QuantizeLinear (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- **domain**: `main`
- **since_version**: `23`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 23**.

### Summary

The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
granularity. The quantization formula is `y = saturate((x / y_scale) + y_zero_point)`.

Saturation is done according to:
- uint16: [0, 65535]
- int16: [-32768, 32767]
- uint8: [0, 255]
- int8: [-128, 127]
- uint4: [0, 15]
- int4: [-8, 7]

For `(x / y_scale)`, it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.

`y_zero_point` and `y` must have the same type. `y_zero_point` is usually not used for quantization to float8 types, but the quantization
formula remains the same for consistency, and the type of the attribute `y_zero_point` still determines the quantization type.

There are three supported quantization granularities, determined by the shape of `y_scale`.
In all cases, `y_zero_point` must have the same shape as `y_scale`.
- Per-tensor (per-layer) quantization: `y_scale` is a scalar.
- Per-axis quantization: The scale must be a 1-D tensor, with the length of the quantization axis. For an input shape
 `(D0, ..., Di, ..., Dn)` and `axis=i`, `y_scale` is a 1-D tensor of length `Di`.
- Blocked quantization: The scale's shape is identical to the input's shape, except for one dimension, in which
  blocking is performed. Given `x` shape `(D0, ..., Di, ..., Dn)`, `axis=i`, and block size `B`: `y_scale` shape is
  `(D0, ..., ceil(Di/B), ..., Dn)`.

### Attributes

* **axis - INT** (default is `'1'`):

  (Optional) The axis of the dequantizing dimension of the input tensor. Used only for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is `[-r, r-1]` where `r = rank(input)`. When the rank of the input is 1, per-tensor quantization is applied, rendering the axis unnecessary in this scenario.

* **block_size - INT** (default is `'0'`):

  (Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given `x` shape `(D0, ..., Di, ..., Dn)`, `y_scale` shape `(S0, ... Si, ...Sn)` and `axis=i`, the accepted range is `[ceil(Di/Si), ceil(Di/(Si-1))-1]`

* **output_dtype - INT** (default is `'0'`):

  (Optional) The output data type. If not supplied, the output data type is inferred from `y_zero_point` data type (`T2`). If neither `output_dtype` nor `y_zero_point` are supplied, output data type is uint8. If both `output_dtype` and `y_zero_point` are specified, `output_dtype` must be `T2`.

* **saturate - INT** (default is `'1'`):

  The parameter defines how the conversion behaves if an input value is out of range of the destination type. It only applies for float 8 quantization (float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz). It is true by default. All cases are fully described in two tables inserted in the operator description.

### Inputs

Between 2 and 3 inputs.

- **x** (heterogeneous) - **T1**:

  N-D full precision Input tensor to be quantized.
- **y_scale** (heterogeneous) - **T1**:

  Scale for doing quantization to get `y`. For per-tensor/layer quantization the scale is a scalar, for per-axis quantization it is a 1-D Tensor and for blocked quantization it has the same shape as the input, except for one dimension in which blocking is performed.
- **y_zero_point** (optional, heterogeneous) - **T2**:

  Zero point for doing quantization to get `y`. Shape must match `y_scale`.Default is uint8 with zero point of 0 if it's not specified.

### Outputs

- **y** (heterogeneous) - **T2**:

  N-D quantized output tensor. It has same shape as input `x`.

### Type Constraints

* **T1** in ( `tensor(bfloat16)`, `tensor(float)`, `tensor(float16)`, `tensor(int32)` ):

  The type of the input 'x'.
* **T2** in ( `tensor(float4e2m1)`, `tensor(float8e4m3fn)`, `tensor(float8e4m3fnuz)`, `tensor(float8e5m2)`, `tensor(float8e5m2fnuz)`, `tensor(int16)`, `tensor(int4)`, `tensor(int8)`, `tensor(uint16)`, `tensor(uint4)`, `tensor(uint8)` ):

  The type of the input `y_zero_point` and the output `y`.

```{toctree}
text_diff_QuantizeLinear_21_23
```

(l-onnx-op-quantizelinear-21)=

## QuantizeLinear - 21

### Version

- **name**: [QuantizeLinear (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- **domain**: `main`
- **since_version**: `21`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 21**.

### Summary

The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
granularity. The quantization formula is `y = saturate((x / y_scale) + y_zero_point)`.
Saturation is done according to:
- uint16: [0, 65535]
- int16: [-32768, 32767]
- uint8: [0, 255]
- int8: [-128, 127]
- uint4: [0, 15]
- int4: [-8, 7]
For `(x / y_scale)`, it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
`y_zero_point` and `y` must have the same type. `y_zero_point` is usually not used for quantization to float8 types, but the quantization
formula remains the same for consistency, and the type of the attribute `y_zero_point` still determines the quantization type.
There are three supported quantization granularities, determined by the shape of `y_scale`.
In all cases, `y_zero_point` must have the same shape as `y_scale`.
- Per-tensor (per-layer) quantization: `y_scale` is a scalar.
- Per-axis quantization: The scale must be a 1-D tensor, with the length of the quantization axis. For an input shape
 `(D0, ..., Di, ..., Dn)` and `axis=i`, `y_scale` is a 1-D tensor of length `Di`.
- Blocked quantization: The scale's shape is identical to the input's shape, except for one dimension, in which
  blocking is performed. Given `x` shape `(D0, ..., Di, ..., Dn)`, `axis=i`, and block size `B`: `y_scale` shape is
  `(D0, ..., ceil(Di/B), ..., Dn)`.

### Attributes

* **axis - INT** (default is `'1'`):

  (Optional) The axis of the dequantizing dimension of the input tensor. Used only for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is `[-r, r-1]` where `r = rank(input)`. When the rank of the input is 1, per-tensor quantization is applied, rendering the axis unnecessary in this scenario.

* **block_size - INT** (default is `'0'`):

  (Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given `x` shape `(D0, ..., Di, ..., Dn)`, `y_scale` shape `(S0, ... Si, ...Sn)` and `axis=i`, the accepted range is `[ceil(Di/Si), ceil(Di/(Si-1))-1]`

* **output_dtype - INT** (default is `'0'`):

  (Optional) The output data type. If not supplied, the output data type is inferred from `y_zero_point` data type (`T2`). If neither `output_dtype` nor `y_zero_point` are supplied, output data type is uint8. If both `output_dtype` and `y_zero_point` are specified, `output_dtype` must be `T2`.

* **saturate - INT** (default is `'1'`):

  The parameter defines how the conversion behaves if an input value is out of range of the destination type. It only applies for float 8 quantization (float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz). It is true by default. All cases are fully described in two tables inserted in the operator description.

### Inputs

Between 2 and 3 inputs.

- **x** (heterogeneous) - **T1**:

  N-D full precision Input tensor to be quantized.
- **y_scale** (heterogeneous) - **T1**:

  Scale for doing quantization to get `y`. For per-tensor/layer quantization the scale is a scalar, for per-axis quantization it is a 1-D Tensor and for blocked quantization it has the same shape as the input, except for one dimension in which blocking is performed.
- **y_zero_point** (optional, heterogeneous) - **T2**:

  Zero point for doing quantization to get `y`. Shape must match `y_scale`.Default is uint8 with zero point of 0 if it's not specified.

### Outputs

- **y** (heterogeneous) - **T2**:

  N-D quantized output tensor. It has same shape as input `x`.

### Type Constraints

* **T1** in ( `tensor(bfloat16)`, `tensor(float)`, `tensor(float16)`, `tensor(int32)` ):

  The type of the input 'x'.
* **T2** in ( `tensor(float8e4m3fn)`, `tensor(float8e4m3fnuz)`, `tensor(float8e5m2)`, `tensor(float8e5m2fnuz)`, `tensor(int16)`, `tensor(int4)`, `tensor(int8)`, `tensor(uint16)`, `tensor(uint4)`, `tensor(uint8)` ):

  The type of the input `y_zero_point` and the output `y`.

```{toctree}
text_diff_QuantizeLinear_19_23
text_diff_QuantizeLinear_19_21
```

(l-onnx-op-quantizelinear-19)=

## QuantizeLinear - 19

### Version

- **name**: [QuantizeLinear (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- **domain**: `main`
- **since_version**: `19`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 19**.

### Summary

The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
The quantization formula is `y = saturate ((x / y_scale) + y_zero_point)`.
For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
For (x / y_scale), it's rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
'y_zero_point' and 'y' must have same type.
'y_zero_point' is usually not used for quantization to float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz,
but the quantization formula remains the same for consistency and
the type of the attribute 'y_zero_point' still determines the quantization type.

### Attributes

* **axis - INT** (default is `'1'`):

  (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

* **saturate - INT** (default is `'1'`):

  The parameter defines how the conversion behaves if an input value is out of range of the destination type. It only applies for float 8 quantization (float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz). It is true by default. All cases are fully described in two tables inserted in the operator description.

### Inputs

Between 2 and 3 inputs.

- **x** (heterogeneous) - **T1**:

  N-D full precision Input tensor to be quantized.
- **y_scale** (heterogeneous) - **T1**:

  Scale for doing quantization to get 'y'. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization.
- **y_zero_point** (optional, heterogeneous) - **T2**:

  Zero point for doing quantization to get 'y'. Shape must match y_scale. Default is uint8 with zero point of 0 if it's not specified.

### Outputs

- **y** (heterogeneous) - **T2**:

  N-D quantized output tensor. It has same shape as input 'x'.

### Type Constraints

* **T1** in ( `tensor(bfloat16)`, `tensor(float)`, `tensor(float16)`, `tensor(int32)` ):

  Constrain 'x' to float, float16, bfloat16 or int32 tensor.
* **T2** in ( `tensor(float8e4m3fn)`, `tensor(float8e4m3fnuz)`, `tensor(float8e5m2)`, `tensor(float8e5m2fnuz)`, `tensor(int8)`, `tensor(uint8)` ):

  Constrain 'y_zero_point' and 'y' to 8-bit integer/float tensor.

```{toctree}
text_diff_QuantizeLinear_13_23
text_diff_QuantizeLinear_13_21
text_diff_QuantizeLinear_13_19
```

(l-onnx-op-quantizelinear-13)=

## QuantizeLinear - 13

### Version

- **name**: [QuantizeLinear (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- **domain**: `main`
- **since_version**: `13`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 13**.

### Summary

The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
The quantization formula is y = saturate ((x / y_scale) + y_zero_point).
For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
For (x / y_scale), it's rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.

### Attributes

* **axis - INT** (default is `'1'`):

  (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

### Inputs

Between 2 and 3 inputs.

- **x** (heterogeneous) - **T1**:

  N-D full precision Input tensor to be quantized.
- **y_scale** (heterogeneous) - **tensor(float)**:

  Scale for doing quantization to get 'y'. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization.
- **y_zero_point** (optional, heterogeneous) - **T2**:

  Zero point for doing quantization to get 'y'. Shape must match y_scale. Default is uint8 with zero point of 0 if it's not specified.

### Outputs

- **y** (heterogeneous) - **T2**:

  N-D quantized output tensor. It has same shape as input 'x'.

### Type Constraints

* **T1** in ( `tensor(float)`, `tensor(int32)` ):

  Constrain 'x' to float or int32 tensor.
* **T2** in ( `tensor(int8)`, `tensor(uint8)` ):

  Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.

```{toctree}
text_diff_QuantizeLinear_10_23
text_diff_QuantizeLinear_10_21
text_diff_QuantizeLinear_10_19
text_diff_QuantizeLinear_10_13
```

(l-onnx-op-quantizelinear-10)=

## QuantizeLinear - 10

### Version

- **name**: [QuantizeLinear (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- **domain**: `main`
- **since_version**: `10`
- **function**: `False`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 10**.

### Summary

The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor.
The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
For (x / y_scale), it's rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.

### Inputs

Between 2 and 3 inputs.

- **x** (heterogeneous) - **T1**:

  N-D full precision Input tensor to be quantized.
- **y_scale** (heterogeneous) - **tensor(float)**:

  Scale for doing quantization to get 'y'. It's a scalar, which means a per-tensor/layer quantization.
- **y_zero_point** (optional, heterogeneous) - **T2**:

  Zero point for doing quantization to get 'y'. It's a scalar, which means a per-tensor/layer quantization. Default value is uint8 typed 0 if it's not specified.

### Outputs

- **y** (heterogeneous) - **T2**:

  N-D quantized output tensor. It has same shape as input 'x'.

### Type Constraints

* **T1** in ( `tensor(float)`, `tensor(int32)` ):

  Constrain 'x' to float or int32 tensor.
* **T2** in ( `tensor(int8)`, `tensor(uint8)` ):

  Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.