DequantizeLinear¶
DequantizeLinear - 23¶
Version¶
domain:
main
since_version:
23
function:
False
support_level:
SupportType.COMMON
shape inference:
True
This version of the operator has been available since version 23.
Summary¶
The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the
full-precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale
. x_scale
and x_zero_point
must have the same shape, determining the quantization’s granularity: a scalar for per-tensor/per-layer quantization,
a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization.
See QuantizeLinear for details on quantization granularity.
x_zero_point
and x
must have the same type. x
and y
must have the same shape. In the case of dequantizing
int32
, there’s no zero point (zero point is supposed to be 0).
zero-point
is usually not used in the case of float8 and 4-bit types quantization, but the dequantization formula remains the same
for consistency. The output type is determined by the attribute output_dtype
. If output_dtype
is not supplied then the output type
is the same as x_scale
. The output type also determines the precision of the multiplication operation.
Attributes¶
axis - INT (default is
'1'
):(Optional) The axis of the dequantizing dimension of the input tensor. Used for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is
[-r, r-1]
wherer = rank(input)
.block_size - INT (default is
'0'
):(Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given
x
shape(D0, ..., Di, ..., Dn)
,y_scale
shape(S0, ... Si, ...Sn)
andaxis=i
, the accepted range is[ceil(Di/Si), ceil(Di/(Si-1))-1]
output_dtype - INT (default is
'0'
):(Optional) The output data type. If not supplied, the output data type is inferred from
x_scale
data type (T2
)
Inputs¶
Between 2 and 3 inputs.
x (heterogeneous) - T1:
N-D quantized input tensor to be de-quantized.
x_scale (heterogeneous) - T2:
Scale for input
x
. For per-tensor/layer dequantization the scale is a scalar, for per per-axis dequantization it is a 1-D Tensor and for blocked dequantization it has the same shape as the input, except for one dimension in which blocking is performed.x_zero_point (optional, heterogeneous) - T1:
Zero point for input
x
. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.
Outputs¶
y (heterogeneous) - T3:
N-D full precision output tensor. It has the same shape as input
x
. The data type is specified by theoutput_dtype
attribute or, in its absence, the type ofx_scale
.
Type Constraints¶
T1 in (
tensor(float4e2m1)
,tensor(float8e4m3fn)
,tensor(float8e4m3fnuz)
,tensor(float8e5m2)
,tensor(float8e5m2fnuz)
,tensor(int16)
,tensor(int32)
,tensor(int4)
,tensor(int8)
,tensor(uint16)
,tensor(uint4)
,tensor(uint8)
):The type of the inputs ‘x_zero_point’ and ‘x’.
T2 in (
tensor(bfloat16)
,tensor(float)
,tensor(float16)
):The type of the input ‘x_scale’.
T3 in (
tensor(bfloat16)
,tensor(float)
,tensor(float16)
):The type of the output ‘y’.
DequantizeLinear - 21¶
Version¶
domain:
main
since_version:
21
function:
False
support_level:
SupportType.COMMON
shape inference:
True
This version of the operator has been available since version 21.
Summary¶
The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the
full-precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale
. x_scale
and x_zero_point
must have the same shape, determining the quantization’s granularity: a scalar for per-tensor/per-layer quantization,
a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization.
See QuantizeLinear for details on quantization granularity.
x_zero_point
and x
must have the same type. x
and y
must have the same shape. In the case of dequantizing
int32
, there’s no zero point (zero point is supposed to be 0).
zero-point
is usually not used in the case of float8 types quantization, but the dequantization formula remains the same
for consistency, and x_scale
still determines the output type.
Attributes¶
axis - INT (default is
'1'
):(Optional) The axis of the dequantizing dimension of the input tensor. Used for per-axis and blocked quantization. Negative value means counting dimensions from the back. Accepted range is
[-r, r-1]
wherer = rank(input)
.block_size - INT (default is
'0'
):(Optional) The size of the quantization block (number of times every scale is replicated). Used only for blocked quantization. The block size is a positive integer. Given
x
shape(D0, ..., Di, ..., Dn)
,y_scale
shape(S0, ... Si, ...Sn)
andaxis=i
, the accepted range is[ceil(Di/Si), ceil(Di/(Si-1))-1]
Inputs¶
Between 2 and 3 inputs.
x (heterogeneous) - T1:
N-D quantized input tensor to be de-quantized.
x_scale (heterogeneous) - T2:
Scale for input
x
. For per-tensor/layer dequantization the scale is a scalar, for per per-axis dequantization it is a 1-D Tensor and for blocked dequantization it has the same shape as the input, except for one dimension in which blocking is performed.x_zero_point (optional, heterogeneous) - T1:
Zero point for input
x
. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.
Outputs¶
y (heterogeneous) - T2:
N-D full precision output tensor. It has same shape as input
x
.
Type Constraints¶
T1 in (
tensor(float8e4m3fn)
,tensor(float8e4m3fnuz)
,tensor(float8e5m2)
,tensor(float8e5m2fnuz)
,tensor(int16)
,tensor(int32)
,tensor(int4)
,tensor(int8)
,tensor(uint16)
,tensor(uint4)
,tensor(uint8)
):The type of the inputs ‘x_zero_point’ and ‘x’.
T2 in (
tensor(bfloat16)
,tensor(float)
,tensor(float16)
):‘x_scale’ determines the output type.
DequantizeLinear - 19¶
Version¶
domain:
main
since_version:
19
function:
False
support_level:
SupportType.COMMON
shape inference:
True
This version of the operator has been available since version 19.
Summary¶
The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor.
The dequantization formula is y = (x - x_zero_point) * x_scale
. x_scale
and x_zero_point
must have same shape, and can be either a scalar
for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
x_zero_point
and x
must have same type. x
and y
must have same shape. In the case of dequantizing int32,
there’s no zero point (zero point is supposed to be 0).
zero-point
is usually not used in the case of float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz quantization,
but the dequantization formula remains the same for consistency and ‘x_scale’ still determines the output type.
Attributes¶
axis - INT (default is
'1'
):(Optional) The axis of the dequantizing dimension of the input tensor. Used only for per-axis quantization. Negative value means counting dimensions from the back. Accepted range is
[-r, r-1]
wherer = rank(input)
. When the rank of the input is 1, per-tensor quantization is applied, rendering the axis unnecessary in this scenario.
Inputs¶
Between 2 and 3 inputs.
x (heterogeneous) - T1:
N-D quantized input tensor to be de-quantized.
x_scale (heterogeneous) - T2:
Scale for input ‘x’. It can be a scalar, which means a per-tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.
x_zero_point (optional, heterogeneous) - T1:
Zero point for input ‘x’. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.
Outputs¶
y (heterogeneous) - T2:
N-D full precision output tensor. It has same shape as input ‘x’.
Type Constraints¶
T1 in (
tensor(float8e4m3fn)
,tensor(float8e4m3fnuz)
,tensor(float8e5m2)
,tensor(float8e5m2fnuz)
,tensor(int32)
,tensor(int8)
,tensor(uint8)
):Constrain ‘x_zero_point’ and ‘x’ to 8-bit integer or float, or /32-bit integer tensor.
T2 in (
tensor(bfloat16)
,tensor(float)
,tensor(float16)
):‘x_scale’ determines the output type.
DequantizeLinear - 13¶
Version¶
domain:
main
since_version:
13
function:
False
support_level:
SupportType.COMMON
shape inference:
True
This version of the operator has been available since version 13.
Summary¶
The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor.
The dequantization formula is y = (x - x_zero_point) * x_scale
. x_scale
and x_zero_point
must have same shape, and can be either a scalar
for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
x_zero_point
and x
must have same type. x
and y
must have same shape. In the case of dequantizing int32,
there’s no zero point (zero point is supposed to be 0).
Attributes¶
axis - INT (default is
'1'
):(Optional) The axis of the dequantizing dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).
Inputs¶
Between 2 and 3 inputs.
x (heterogeneous) - T:
N-D quantized input tensor to be de-quantized.
x_scale (heterogeneous) - tensor(float):
Scale for input ‘x’. It can be a scalar, which means a per-tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.
x_zero_point (optional, heterogeneous) - T:
Zero point for input ‘x’. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.
Outputs¶
y (heterogeneous) - tensor(float):
N-D full precision output tensor. It has same shape as input ‘x’.
Type Constraints¶
T in (
tensor(int32)
,tensor(int8)
,tensor(uint8)
):Constrain ‘x_zero_point’ and ‘x’ to 8-bit/32-bit integer tensor.
DequantizeLinear - 10¶
Version¶
domain:
main
since_version:
10
function:
False
support_level:
SupportType.COMMON
shape inference:
True
This version of the operator has been available since version 10.
Summary¶
The linear dequantization operator. It consumes a quantized tensor, a scale, a zero point to compute the full precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. ‘x_scale’ and ‘x_zero_point’ are both scalars. ‘x_zero_point’ and ‘x’ must have same type. ‘x’ and ‘y’ must have same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0).
Inputs¶
Between 2 and 3 inputs.
x (heterogeneous) - T:
N-D quantized input tensor to be de-quantized.
x_scale (heterogeneous) - tensor(float):
Scale for input ‘x’. It’s a scalar, which means a per-tensor/layer quantization.
x_zero_point (optional, heterogeneous) - T:
Zero point for input ‘x’. It’s a scalar, which means a per-tensor/layer quantization. It’s optional. 0 is the default value when it’s not specified.
Outputs¶
y (heterogeneous) - tensor(float):
N-D full precision output tensor. It has same shape as input ‘x’.
Type Constraints¶
T in (
tensor(int32)
,tensor(int8)
,tensor(uint8)
):Constrain ‘x_zero_point’ and ‘x’ to 8-bit/32-bit integer tensor.