NegativeLogLikelihoodLoss - 12 vs 22¶
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
NegativeLogLikelihoodLoss12 → NegativeLogLikelihoodLoss22
RENAMED
@@ -1 +1 @@
|
|
1
1
|
A NegativeLogLikelihoodLoss operator computes (weighted) negative log likelihood loss.
|
2
2
|
Its "input" tensor has the shape of (N, C, d1, d2, ..., dk) where k >= 0.
|
3
3
|
The "input" tensor contains log-probabilities for input[n, :, d_1, d_2,..., d_k] being in a class of [0, C).
|
4
4
|
The operator's "target" input tensor has the shape of (N, d1, d2, ..., dk). It encodes class labels (one of C classes)
|
5
5
|
or it may contain a special value (indicated by an attribute ignore_index) for N x d1 x d2 x ... x dk samples.
|
6
6
|
The loss value for input[n, :, d_1, d_2,...d_k] being classified as class c = target[n][d_1][d_2]...[d_k] is computed as:
|
7
|
+
|
8
|
+
|
7
|
-
|
9
|
+
loss[n][d_1][d_2]...[d_k] = -input[n][c][d_1][d_2]...[d_k].
|
10
|
+
|
11
|
+
|
8
12
|
When an optional "weight" is provided, the sample loss is calculated as:
|
13
|
+
|
14
|
+
|
9
|
-
|
15
|
+
loss[n][d_1][d_2]...[d_k] = -input[n][c][d_1][d_2]...[d_k] * weight[c].
|
16
|
+
|
17
|
+
|
10
18
|
loss is zero for the case when target-value equals ignore_index.
|
19
|
+
|
11
|
-
|
20
|
+
loss[n][d_1][d_2]...[d_k] = 0, when target[n][d_1][d_2]...[d_k] = ignore_index
|
21
|
+
|
22
|
+
|
12
23
|
If "reduction" attribute is set to "none", the operator's output will be the above loss with shape (N, d1, d2, ..., dk).
|
13
24
|
If "reduction" attribute is set to "mean" (the default attribute value), the output loss is (weight) averaged:
|
25
|
+
|
26
|
+
|
14
|
-
|
27
|
+
mean(loss), if "weight" is not provided,
|
28
|
+
|
29
|
+
|
15
30
|
or if weight is provided,
|
31
|
+
|
32
|
+
|
16
|
-
|
33
|
+
sum(loss) / sum(weight[target[n][d_1][d_2]...[d_k]]]), for all samples.
|
34
|
+
|
35
|
+
|
17
|
-
If "reduction" attribute is set to "sum", the output is a scalar:
|
36
|
+
If "reduction" attribute is set to "sum", the output is a scalar: sum(loss).
|
18
|
-
|
37
|
+
|
19
38
|
See also https://pytorch.org/docs/stable/nn.html#torch.nn.NLLLoss.
|
39
|
+
|
20
40
|
Example 1:
|
41
|
+
|
42
|
+
|
21
|
-
|
43
|
+
// negative log likelihood loss, "none" reduction
|
22
|
-
|
44
|
+
N, C, d1 = 2, 3, 2
|
23
|
-
|
45
|
+
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
24
|
-
|
46
|
+
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
25
|
-
|
47
|
+
target = [[2, 1], [0, 2]]
|
48
|
+
|
26
|
-
|
49
|
+
loss = np.zeros((N, d1))
|
27
|
-
|
50
|
+
for n in range(N):
|
28
|
-
|
51
|
+
for d_1 in range(d1):
|
29
|
-
|
52
|
+
c = target[n][d_1]
|
30
|
-
|
53
|
+
loss[n][d_1] = -input[n][c][d_1]
|
54
|
+
|
31
|
-
|
55
|
+
// print(loss)
|
32
|
-
|
56
|
+
// [[-3. -2.]
|
33
|
-
|
57
|
+
// [-0. -2.]]
|
58
|
+
|
59
|
+
|
34
60
|
Example 2:
|
61
|
+
|
62
|
+
|
35
|
-
|
63
|
+
// weighted negative log likelihood loss, sum reduction
|
36
|
-
|
64
|
+
N, C, d1 = 2, 3, 2
|
37
|
-
|
65
|
+
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
38
|
-
|
66
|
+
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
39
|
-
|
67
|
+
target = [[2, 1], [0, 2]]
|
40
|
-
|
68
|
+
weight = [0.2, 0.3, 0.1]
|
41
|
-
|
69
|
+
loss = np.zeros((N, d1))
|
42
|
-
|
70
|
+
for n in range(N):
|
43
|
-
|
71
|
+
for d_1 in range(d1):
|
44
|
-
|
72
|
+
c = target[n][d_1]
|
45
|
-
|
73
|
+
loss[n][d_1] = -input[n][c][d_1] * weight[c]
|
74
|
+
|
46
|
-
|
75
|
+
loss = np.sum(loss)
|
47
|
-
|
76
|
+
// print(loss)
|
48
|
-
|
77
|
+
// -1.1
|
78
|
+
|
79
|
+
|
49
80
|
Example 3:
|
81
|
+
|
82
|
+
|
50
|
-
|
83
|
+
// weighted negative log likelihood loss, mean reduction
|
51
|
-
|
84
|
+
N, C, d1 = 2, 3, 2
|
52
|
-
|
85
|
+
input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],
|
53
|
-
|
86
|
+
[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]
|
54
|
-
|
87
|
+
target = [[2, 1], [0, 2]]
|
55
|
-
|
88
|
+
weight = [0.2, 0.3, 0.1]
|
56
|
-
|
89
|
+
loss = np.zeros((N, d1))
|
57
|
-
|
90
|
+
weight_total = 0
|
58
|
-
|
91
|
+
for n in range(N):
|
59
|
-
|
92
|
+
for d_1 in range(d1):
|
60
|
-
|
93
|
+
c = target[n][d_1]
|
61
|
-
|
94
|
+
loss[n][d_1] = -input[n][c][d_1] * weight[c]
|
62
|
-
|
95
|
+
weight_total = weight_total + weight[c]
|
96
|
+
|
63
|
-
|
97
|
+
loss = np.sum(loss) / weight_total
|
64
|
-
|
98
|
+
// print(loss)
|
65
|
-
|
99
|
+
// -1.57
|
100
|
+
|
66
101
|
### Attributes
|
67
102
|
* **ignore_index - INT** :
|
68
103
|
Specifies a target value that is ignored and does not contribute to the input gradient. It's an optional value.
|
69
104
|
* **reduction - STRING** (default is 'mean'):
|
70
105
|
Type of reduction to apply to loss: none, sum, mean (default). 'none': the output is the loss for each sample. 'sum': the output will be summed. 'mean': the sum of the output will be divided by the sum of applied weights.
|
71
106
|
### Inputs
|
72
107
|
Between 2 and 3 inputs.
|
73
108
|
- **input** (heterogeneous) - **T**:
|
74
109
|
Input tensor of shape (N, C) or (N, C, d1, d2, ..., dk).
|
75
110
|
- **target** (heterogeneous) - **Tind**:
|
76
111
|
Target tensor of shape (N) or (N, d1, d2, ..., dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.
|
77
112
|
- **weight** (optional, heterogeneous) - **T**:
|
78
113
|
Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.
|
79
114
|
### Outputs
|
80
115
|
- **loss** (heterogeneous) - **T**:
|
81
116
|
The negative log likelihood loss
|
82
117
|
### Type Constraints
|
83
|
-
* **T** in ( tensor(double), tensor(float), tensor(float16) ):
|
118
|
+
* **T** in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ):
|
84
119
|
Constrain input, weight, and output types to floating-point tensors.
|
85
120
|
* **Tind** in ( tensor(int32), tensor(int64) ):
|
86
121
|
Constrain target to integer types
|