(l-onnx-doc-SoftmaxCrossEntropyLoss)=

# SoftmaxCrossEntropyLoss


(l-onnx-op-softmaxcrossentropyloss-13)=

## SoftmaxCrossEntropyLoss - 13

### Version

- **name**: [SoftmaxCrossEntropyLoss (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#SoftmaxCrossEntropyLoss)
- **domain**: `main`
- **since_version**: `13`
- **function**: `True`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 13**.

### Summary

Loss function that measures the softmax cross entropy
between 'scores' and 'labels'.
This operator first computes a loss tensor whose shape is identical to the labels input.
If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).
If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),
the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.
After L is available, this operator can optionally do a reduction operator.

* shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),
  with K >= 1 in case of K-dimensional loss.
* shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),
  with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can calculated as follows:
```
l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.
```
or
```
l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided.
```

loss is zero for the case when label-value equals ignore_index.
```
l[i][d1][d2]...[dk]  = 0, when labels[n][d1][d2]...[dk] = ignore_index
```

where:
```
p = Softmax(scores)
y = Log(p)
c = labels[i][d1][d2]...[dk]
```

Finally, L is optionally reduced:

* If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).
* If reduction = 'sum', the output is scalar: Sum(L).
* If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: `ReduceSum(L) / ReduceSum(W)`,
  where tensor W is of shape `(N, D1, D2, ..., Dk)` and `W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]]`.

### Attributes

* **ignore_index - INT** :

  Specifies a target value that is ignored and does not contribute to the input gradient. It's an optional value.

* **reduction - STRING** (default is `'mean'`):

  Type of reduction to apply to loss: none, sum, mean(default). 'none': no reduction will be applied, 'sum': the output will be summed. 'mean': the sum of the output will be divided by the number of elements in the output.

### Inputs

Between 2 and 3 inputs.

- **scores** (heterogeneous) - **T**:

  The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of dimensions.
- **labels** (heterogeneous) - **Tind**:

  The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, ..., Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.
- **weights** (optional, heterogeneous) - **T**:

  A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

### Outputs

Between 1 and 2 outputs.

- **output** (heterogeneous) - **T**:

  Weighted loss float Tensor. If reduction is 'none', this has the shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of K-dimensional loss. Otherwise, it is a scalar.
- **log_prob** (optional, heterogeneous) - **T**:

  Log probability tensor. If the output of softmax is prob, its value is log(prob).

### Type Constraints

* **T** in ( `tensor(bfloat16)`, `tensor(double)`, `tensor(float)`, `tensor(float16)` ):

  Constrain input and output types to float tensors.
* **Tind** in ( `tensor(int32)`, `tensor(int64)` ):

  Constrain target to integer types

```{toctree}
text_diff_SoftmaxCrossEntropyLoss_12_13
```

(l-onnx-op-softmaxcrossentropyloss-12)=

## SoftmaxCrossEntropyLoss - 12

### Version

- **name**: [SoftmaxCrossEntropyLoss (GitHub)](https://github.com/onnx/onnx/blob/main/docs/Operators.md#SoftmaxCrossEntropyLoss)
- **domain**: `main`
- **since_version**: `12`
- **function**: `True`
- **support_level**: `SupportType.COMMON`
- **shape inference**: `True`

This version of the operator has been available
**since version 12**.

### Summary

Loss function that measures the softmax cross entropy
between 'scores' and 'labels'.
This operator first computes a loss tensor whose shape is identical to the labels input.
If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).
If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),
the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.
After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),
        with K >= 1 in case of K-dimensional loss.
shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),
        with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can calculated as follows:
    l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.
or
    l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided.

loss is zero for the case when label-value equals ignore_index.
    l[i][d1][d2]...[dk]  = 0, when labels[n][d1][d2]...[dk] = ignore_index

where:
    p = Softmax(scores)
    y = Log(p)
    c = labels[i][d1][d2]...[dk]

Finally, L is optionally reduced:
If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).
If reduction = 'sum', the output is scalar: Sum(L).
If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W),
where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]].

### Attributes

* **ignore_index - INT** :

  Specifies a target value that is ignored and does not contribute to the input gradient. It's an optional value.

* **reduction - STRING** (default is `'mean'`):

  Type of reduction to apply to loss: none, sum, mean(default). 'none': no reduction will be applied, 'sum': the output will be summed. 'mean': the sum of the output will be divided by the number of elements in the output.

### Inputs

Between 2 and 3 inputs.

- **scores** (heterogeneous) - **T**:

  The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of dimensions.
- **labels** (heterogeneous) - **Tind**:

  The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, ..., Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.
- **weights** (optional, heterogeneous) - **T**:

  A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

### Outputs

Between 1 and 2 outputs.

- **output** (heterogeneous) - **T**:

  Weighted loss float Tensor. If reduction is 'none', this has the shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of K-dimensional loss. Otherwise, it is a scalar.
- **log_prob** (optional, heterogeneous) - **T**:

  Log probability tensor. If the output of softmax is prob, its value is log(prob).

### Type Constraints

* **T** in ( `tensor(double)`, `tensor(float)`, `tensor(float16)` ):

  Constrain input and output types to float tensors.
* **Tind** in ( `tensor(int32)`, `tensor(int64)` ):

  Constrain target to integer types