BatchNormalization

BatchNormalization - 15

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, There are five required inputs ‘X’, ‘scale’, ‘B’, ‘input_mean’ and ‘input_var’. Note that ‘input_mean’ and ‘input_var’ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:

  • Output case #1: Y, running_mean, running_var (training_mode=True)

  • Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to sum(sqrd(x_i - x_avg)) / N where N is the population size (this formula does not use sample size N - 1).

The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * … * Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX IR for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon - FLOAT (default is '1e-05'):

    The epsilon value to use to avoid division by zero.

  • momentum - FLOAT (default is '0.9'):

    Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).

  • training_mode - INT (default is '0'):

    If set to true, it indicates BatchNormalization is being used for training, and outputs 1 and 2 are to be computed.

Inputs

  • X (heterogeneous) - T:

    Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1

  • scale (heterogeneous) - T1:

    Scale tensor of shape ©.

  • B (heterogeneous) - T1:

    Bias tensor of shape ©.

  • input_mean (heterogeneous) - T2:

    running (training) or estimated (testing) mean tensor of shape ©.

  • input_var (heterogeneous) - T2:

    running (training) or estimated (testing) variance tensor of shape ©.

Outputs

Between 1 and 3 outputs.

  • Y (heterogeneous) - T:

    The output tensor of the same shape as X

  • running_mean (optional, heterogeneous) - T2:

    The running mean after the BatchNormalization operator.

  • running_var (optional, heterogeneous) - T2:

    The running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):

    Constrain input and output types to float tensors.

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):

    Constrain scale and bias types to float tensors.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):

    Constrain mean and variance types to float tensors.

BatchNormalization - 14

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, There are five required inputs ‘X’, ‘scale’, ‘B’, ‘input_mean’ and ‘input_var’. Note that ‘input_mean’ and ‘input_var’ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:

Output case #1: Y, running_mean, running_var (training_mode=True) Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to
sum(sqrd(x_i - x_avg)) / N
where N is the population size (this formula does not use sample size N - 1).

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * … * Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX IR for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon - FLOAT (default is '1e-05'):

    The epsilon value to use to avoid division by zero.

  • momentum - FLOAT (default is '0.9'):

    Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).

  • training_mode - INT (default is '0'):

    If set to true, it indicates BatchNormalization is being used for training, and outputs 1, 2, 3, and 4 would be populated.

Inputs

  • X (heterogeneous) - T:

    Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1

  • scale (heterogeneous) - T:

    Scale tensor of shape ©.

  • B (heterogeneous) - T:

    Bias tensor of shape ©.

  • input_mean (heterogeneous) - U:

    running (training) or estimated (testing) mean tensor of shape ©.

  • input_var (heterogeneous) - U:

    running (training) or estimated (testing) variance tensor of shape ©.

Outputs

Between 1 and 3 outputs.

  • Y (heterogeneous) - T:

    The output tensor of the same shape as X

  • running_mean (optional, heterogeneous) - U:

    The running mean after the BatchNormalization operator.

  • running_var (optional, heterogeneous) - U:

    The running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):

    Constrain input and output types to float tensors.

  • U in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):

    Constrain mean and variance types to float tensors. It allows all float type for U.

BatchNormalization - 9

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x CD1D2 …*Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX IR for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon - FLOAT (default is '1e-05'):

    The epsilon value to use to avoid division by zero.

  • momentum - FLOAT (default is '0.9'):

    Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).

Inputs

  • X (heterogeneous) - T:

    Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1

  • scale (heterogeneous) - T:

    Scale tensor of shape ©.

  • B (heterogeneous) - T:

    Bias tensor of shape ©.

  • mean (heterogeneous) - T:

    running (training) or estimated (testing) mean tensor of shape ©.

  • var (heterogeneous) - T:

    running (training) or estimated (testing) variance tensor of shape ©.

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T:

    The output tensor of the same shape as X

  • mean (optional, heterogeneous) - T:

    The running mean after the BatchNormalization operator.

  • var (optional, heterogeneous) - T:

    The running variance after the BatchNormalization operator.

  • saved_mean (optional, heterogeneous) - T:

    Saved mean used during training to speed up gradient computation.

  • saved_var (optional, heterogeneous) - T:

    Saved variance used during training to speed up gradient computation.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ):

    Constrain input and output types to float tensors.

BatchNormalization - 7

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode) This operator has optional inputs/outputs. See ONNX IR for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon - FLOAT (default is '1e-05'):

    The epsilon value to use to avoid division by zero.

  • momentum - FLOAT (default is '0.9'):

    Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).

  • spatial - INT (default is '1'):

    If true, compute the mean and variance across per activation. If false, compute the mean and variance across per feature over each mini-batch.

Inputs

  • X (heterogeneous) - T:

    Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

  • scale (heterogeneous) - T:

    If spatial is true, the dimension of scale is ©. If spatial is false, the dimensions of scale are (C x D1 x … x Dn)

  • B (heterogeneous) - T:

    If spatial is true, the dimension of bias is ©. If spatial is false, the dimensions of bias are (C x D1 x … x Dn)

  • mean (heterogeneous) - T:

    If spatial is true, the dimension of the running mean (training) or the estimated mean (testing) is ©. If spatial is false, the dimensions of the running mean (training) or the estimated mean (testing) are (C x D1 x … x Dn).

  • var (heterogeneous) - T:

    If spatial is true, the dimension of the running variance(training) or the estimated variance (testing) is ©. If spatial is false, the dimensions of the running variance(training) or the estimated variance (testing) are (C x D1 x … x Dn).

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T:

    The output tensor of the same shape as X

  • mean (optional, heterogeneous) - T:

    The running mean after the BatchNormalization operator.

  • var (optional, heterogeneous) - T:

    The running variance after the BatchNormalization operator.

  • saved_mean (optional, heterogeneous) - T:

    Saved mean used during training to speed up gradient computation.

  • saved_var (optional, heterogeneous) - T:

    Saved variance used during training to speed up gradient computation.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ):

    Constrain input and output types to float tensors.

BatchNormalization - 6

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

  • epsilon - FLOAT (default is '1e-05'):

    The epsilon value to use to avoid division by zero, default is 1e-5f.

  • is_test - INT (default is '0'):

    If set to nonzero, run spatial batch normalization in test mode, default is 0.

  • momentum - FLOAT (default is '0.9'):

    Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum), default is 0.9f.

  • spatial - INT (default is '1'):

    If true, compute the mean and variance across all spatial elements If false, compute the mean and variance across per feature.Default is 1.

Inputs

  • X (heterogeneous) - T:

    Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

  • scale (heterogeneous) - T:

    The scale as a 1-dimensional tensor of size C to be applied to the output.

  • B (heterogeneous) - T:

    The bias as a 1-dimensional tensor of size C to be applied to the output.

  • mean (heterogeneous) - T:

    The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.

  • var (heterogeneous) - T:

    The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T:

    The output tensor of the same shape as X.

  • mean (optional, heterogeneous) - T:

    The running mean after the BatchNormalization operator. Must be in-place with the input mean. Should not be used for testing.

  • var (optional, heterogeneous) - T:

    The running variance after the BatchNormalization operator. Must be in-place with the input var. Should not be used for testing.

  • saved_mean (optional, heterogeneous) - T:

    Saved mean used during training to speed up gradient computation. Should not be used for testing.

  • saved_var (optional, heterogeneous) - T:

    Saved variance used during training to speed up gradient computation. Should not be used for testing.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ):

    Constrain input and output types to float tensors.

BatchNormalization - 1

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

  • consumed_inputs - INTS (required) :

    legacy optimization attribute.

  • epsilon - FLOAT (default is '1e-05'):

    The epsilon value to use to avoid division by zero, default is 1e-5f.

  • is_test - INT (default is '0'):

    If set to nonzero, run spatial batch normalization in test mode, default is 0.

  • momentum - FLOAT (default is '0.9'):

    Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum), default is 0.9f.

  • spatial - INT (default is '1'):

    If true, compute the mean and variance across all spatial elements If false, compute the mean and variance across per feature.Default is 1.

Inputs

  • X (heterogeneous) - T:

    The input 4-dimensional tensor of shape NCHW.

  • scale (heterogeneous) - T:

    The scale as a 1-dimensional tensor of size C to be applied to the output.

  • B (heterogeneous) - T:

    The bias as a 1-dimensional tensor of size C to be applied to the output.

  • mean (heterogeneous) - T:

    The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.

  • var (heterogeneous) - T:

    The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T:

    The output 4-dimensional tensor of the same shape as X.

  • mean (optional, heterogeneous) - T:

    The running mean after the BatchNormalization operator. Must be in-place with the input mean. Should not be used for testing.

  • var (optional, heterogeneous) - T:

    The running variance after the BatchNormalization operator. Must be in-place with the input var. Should not be used for testing.

  • saved_mean (optional, heterogeneous) - T:

    Saved mean used during training to speed up gradient computation. Should not be used for testing.

  • saved_var (optional, heterogeneous) - T:

    Saved variance used during training to speed up gradient computation. Should not be used for testing.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ):

    Constrain input and output types to float tensors.