DequantizeLinear

DequantizeLinear - 23

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 23

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 23.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full-precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point must have the same shape, determining the quantization’s granularity: a scalar for per-tensor/per-layer quantization, a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization. See QuantizeLinear for details on quantization granularity.

x_zero_point and x must have the same type. x and y must have the same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0). zero-point is usually not used in the case of float8 and 4-bit types quantization, but the dequantization formula remains the same for consistency. The output type is determined by the attribute output_dtype. If output_dtype is not supplied then the output type is the same as x_scale. The output type also determines the precision of the multiplication operation.

Attributes

  • axis - INT (default is '1'):

    (Optional) The axis of the dequantizing dimension of the input tensor. Used for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

  • block_size - INT (default is '0'):

    (Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given x shape (D0, ..., Di, ..., Dn), y_scale shape (S0, ... Si, ...Sn) and axis=i, the accepted range is [ceil(Di/Si), ceil(Di/(Si-1))-1]

  • output_dtype - INT (default is '0'):

    (Optional) The output data type. If not supplied, the output data type is inferred from x_scale data type (T2)

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T1:

    N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - T2:

    Scale for input x. For per-tensor/layer dequantization the scale is a scalar, for per per-axis dequantization it is a 1-D Tensor and for blocked dequantization it has the same shape as the input, except for one dimension in which blocking is performed.

  • x_zero_point (optional, heterogeneous) - T1:

    Zero point for input x. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.

Outputs

  • y (heterogeneous) - T3:

    N-D full precision output tensor. It has the same shape as input x. The data type is specified by the output_dtype attribute or, in its absence, the type of x_scale.

Type Constraints

  • T1 in ( tensor(float4e2m1), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8) ):

    The type of the inputs ‘x_zero_point’ and ‘x’.

  • T2 in ( tensor(bfloat16), tensor(float), tensor(float16) ):

    The type of the input ‘x_scale’.

  • T3 in ( tensor(bfloat16), tensor(float), tensor(float16) ):

    The type of the output ‘y’.

DequantizeLinear - 21

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 21

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 21.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full-precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point must have the same shape, determining the quantization’s granularity: a scalar for per-tensor/per-layer quantization, a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization. See QuantizeLinear for details on quantization granularity. x_zero_point and x must have the same type. x and y must have the same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0). zero-point is usually not used in the case of float8 types quantization, but the dequantization formula remains the same for consistency, and x_scale still determines the output type.

Attributes

  • axis - INT (default is '1'):

    (Optional) The axis of the dequantizing dimension of the input tensor. Used for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

  • block_size - INT (default is '0'):

    (Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given x shape (D0, ..., Di, ..., Dn), y_scale shape (S0, ... Si, ...Sn) and axis=i, the accepted range is [ceil(Di/Si), ceil(Di/(Si-1))-1]

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T1:

    N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - T2:

    Scale for input x. For per-tensor/layer dequantization the scale is a scalar, for per per-axis dequantization it is a 1-D Tensor and for blocked dequantization it has the same shape as the input, except for one dimension in which blocking is performed.

  • x_zero_point (optional, heterogeneous) - T1:

    Zero point for input x. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.

Outputs

  • y (heterogeneous) - T2:

    N-D full precision output tensor. It has same shape as input x.

Type Constraints

  • T1 in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8) ):

    The type of the inputs ‘x_zero_point’ and ‘x’.

  • T2 in ( tensor(bfloat16), tensor(float), tensor(float16) ):

    ‘x_scale’ determines the output type.

DequantizeLinear - 19

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 19

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 19.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. x_zero_point and x must have same type. x and y must have same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0). zero-point is usually not used in the case of float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz quantization, but the dequantization formula remains the same for consistency and ‘x_scale’ still determines the output type.

Attributes

  • axis - INT (default is '1'):

    (Optional) The axis of the dequantizing dimension of the input tensor. Used only for per-axis quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). When the rank of the input is 1, per-tensor quantization is applied, rendering the axis unnecessary in this scenario.

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T1:

    N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - T2:

    Scale for input ‘x’. It can be a scalar, which means a per-tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.

  • x_zero_point (optional, heterogeneous) - T1:

    Zero point for input ‘x’. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.

Outputs

  • y (heterogeneous) - T2:

    N-D full precision output tensor. It has same shape as input ‘x’.

Type Constraints

  • T1 in ( tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int32), tensor(int8), tensor(uint8) ):

    Constrain ‘x_zero_point’ and ‘x’ to 8-bit integer or float, or /32-bit integer tensor.

  • T2 in ( tensor(bfloat16), tensor(float), tensor(float16) ):

    ‘x_scale’ determines the output type.

DequantizeLinear - 13

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. x_scale and x_zero_point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. x_zero_point and x must have same type. x and y must have same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0).

Attributes

  • axis - INT (default is '1'):

    (Optional) The axis of the dequantizing dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T:

    N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - tensor(float):

    Scale for input ‘x’. It can be a scalar, which means a per-tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.

  • x_zero_point (optional, heterogeneous) - T:

    Zero point for input ‘x’. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.

Outputs

  • y (heterogeneous) - tensor(float):

    N-D full precision output tensor. It has same shape as input ‘x’.

Type Constraints

  • T in ( tensor(int32), tensor(int8), tensor(uint8) ):

    Constrain ‘x_zero_point’ and ‘x’ to 8-bit/32-bit integer tensor.

DequantizeLinear - 10

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, a zero point to compute the full precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. ‘x_scale’ and ‘x_zero_point’ are both scalars. ‘x_zero_point’ and ‘x’ must have same type. ‘x’ and ‘y’ must have same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0).

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T:

    N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - tensor(float):

    Scale for input ‘x’. It’s a scalar, which means a per-tensor/layer quantization.

  • x_zero_point (optional, heterogeneous) - T:

    Zero point for input ‘x’. It’s a scalar, which means a per-tensor/layer quantization. It’s optional. 0 is the default value when it’s not specified.

Outputs

  • y (heterogeneous) - tensor(float):

    N-D full precision output tensor. It has same shape as input ‘x’.

Type Constraints

  • T in ( tensor(int32), tensor(int8), tensor(uint8) ):

    Constrain ‘x_zero_point’ and ‘x’ to 8-bit/32-bit integer tensor.