Scan - 8 vs 9

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show
  1. Scan8 → Scan9 +68 -64
Scan8 → Scan9 RENAMED
@@ -1 +1 @@
1
1
  Scan can be used to iterate over one or more scan_input tensors,
2
2
  constructing zero or more scan_output tensors. It combines ideas from general recurrences,
3
3
  functional programming constructs such as scan, fold, map, and zip, and is intended to enable
4
4
  generalizations of RNN-like constructs for sequence-to-sequence processing.
5
5
  Other tensors (referred to as state_variables here) can be used to carry a state
6
6
  when iterating from one element to another (similar to hidden-state in RNNs, also referred
7
- to as loop-carried dependences in the context of loops). All these tensors are required to
7
+ to as loop-carried dependences in the context of loops).
8
- have the same shape in each iteration of the loop (a restriction imposed to enable efficient
9
- memory allocation). Many common usages involve a single scan_input tensor (where functionality
8
+ Many common usages involve a single scan_input tensor (where functionality
10
9
  similar to scan, fold and map can be obtained). When more than one scan_input is used,
11
10
  a behavior similar to zip is obtained.
12
11
  The attribute body must be a graph, specifying the computation to be performed in
13
12
  every iteration. It takes as input the current values of the state_variables and
14
13
  the current iterated element of the scan_inputs. It must return the (updated) values
15
14
  of the state_variables and zero or more scan_output_element tensors. The values of the
16
15
  scan_output_element tensors are concatenated over all the iterations to produce the
17
16
  scan_output values of the scan construct (similar to the concatenated intermediate
18
- hidden-state values of RNN-like constructs).
17
+ hidden-state values of RNN-like constructs). All the output tensors (state_variables as
18
+ well as scan_output_element tensors) are required to have the same shape in each iteration
19
+ of the loop (a restriction imposed to enable efficient memory allocation).
20
+
21
+ Note that the iterated element passed to the body subgraph does not have a sequence
22
+ axis. It will have a rank one less than the rank of the corresponding scan_input.
19
23
  The scan operation returns the final values of the state_variables as well as the
20
24
  scan_outputs.
25
+ The optional attribute scan_input_directions specifies the direction (forward or backward)
26
+ for each scan input. If this attribute is omitted, all sequences are scanned in the forward
27
+ direction. A bidirectional scan may be performed by specifying the same tensor input twice
21
- The operation supports batching, and the batch-axis is required to be 0.
28
+ in the scan_inputs, once with a forward direction, and once with a backward direction.
29
+ The scan_output of the operation is produced by concatenating the scan_output_element
30
+ values produced by the body in each iteration. The optional attribute scan_output_directions
31
+ specifies the direction in which scan_output is constructed (by appending or prepending the
32
+ scan_output_element to scan_output in each iteration) for each scan_output. If this attribute
22
- When multiple scan_input tensors are used, they must all have the same batch-size,
33
+ is omitted, the scan_output_element is appended to the scan_output in each iteration.
23
- and they must all have the same maximum-sequence-length (the dimensionality of the
24
- sequence axis or scan axis). The sequence axis or scan axis is required to be 1.
25
- The operation has an optional sequence_lens input (of shape [BATCH_SIZE]) to
26
- allow variable length sequences of length <= the maximum-sequence-length. If this
27
- input is not specified, all sequences are assumed to be of length equal to
28
- maximum-sequence-length. For variable length input sequences, the scan_outputs
29
- will consist of a sequence of same length as the input, padded to the
30
- maximum-sequence-length.
31
- The optional attribute directions can be used to scan a sequence in the reverse direction.
34
+ The optional attribute scan_input_axes specifies the axis to be scanned for each scan_input.
35
+ If omitted, every scan_input will be scanned in axis 0. For example, if axis 0 is the
36
+ batch axis and axis 1 is the time axis (to be scanned), specify an axis value of 1.
37
+ Note that scanning a non-zero axis may be less efficient than scanning axis zero.
38
+
32
- If this attribute is omitted, all sequences are scanned in the forward direction.
39
+ The optional attribute scan_output_axes specifies the axis along which the scan_outputs
33
- A bidirectional scan be performed by specifying the same tensor input twice in the
40
+ are accumulated for each scan_output. For example, if axis 1 is the time axis (to be
34
- scan_inputs, once with a forward direction, and once with a backward direction.
41
+ scanned) for both inputs and outputs, specify a scan_input axis and scan_output axis
42
+ value of 1.
35
43
  Note that because of the ONNX restriction that only the last parameter of an operator can
36
44
  be variadic, the initial-states and scan-inputs are listed together as one input parameter.
37
45
  Similarly, the final-states and scan-outputs are listed together as one output parameter.
38
46
  The attribute num_scan_inputs indicates the number M of scan-inputs.
39
47
  The behavior of
40
48
  Scan <
41
49
  num_scan_inputs = m,
42
- body = loop-body
50
+ body = loop-body,
51
+ scan_input_axes = [axis_1, ..., axis_m]
43
- > (sequence_lengths, init_1, ..., init_n, scan_1, ..., scan_m)
52
+ > (init_1, ..., init_n, scan_1, ..., scan_m)
44
53
  is equivalent to the following pseudo-code:
45
- // T.shape[0] denotes the batch-size of T
54
+ // scan_i.shape[axis_i] denotes the (max) sequence-length of scan_i
46
- // The batch-size of scan_1, ..., scan_m are all required to be equal
55
+ // scan_i.shape[axis_i] is required to be equal to scan_j.shape[axis_j] for all i,j.
47
- batch_size = scan_1.shape[0];
56
+ sequence_length = scan_1.shape[axis_1];
57
+ // initialize state-variables
58
+ st_1 = init_1; ... st_n = init_n;
48
- // scan_i.shape[1] denotes the (max) sequence-length of scan_i
59
+ // initialize scan-output variables: [] denotes an empty tensor
49
- // scan_i.shape[1] is required to be equal to scan_j.shape[1] for all i,j.
50
- max_sequence_length = scan_1.shape[1];
60
+ scan_out_1 = []; ...; scan_out_k = [];
61
+ // identify number of iterations:
62
+ // execute loop
51
- for (int batch = 0; batch < batch_size; ++batch) {
63
+ for (int t = 0; t < sequence_length; ++t) {
64
+ // generate the scan-input elements: the notation T<axis=k>[t] indicates the sub-tensor
65
+ // of rank one less than T obtained by indexing T at position t along axis k.
66
+ si_1 = scan_1<axis=axis_1>[t];
67
+ ... ;
68
+ si_m = scan_m<axis=axis_m>[t];
69
+ // execute loop-body
70
+ st_1, ..., st_n, so_1, ..., so_k = loop-body(st_1, ..., st_n, si_1, ..., si_m)
52
- // initialize state-variables
71
+ // accumulate the scan-output elements
53
- st_1 = init_1; ... st_n = init_n;
54
- // initialize scan-output variables: [] denotes an empty tensor
72
+ scan_out_1 = Concat<axis=0>(scan_out_1, so_1); ... ; scan_out_k = Concat<axis=0>(scan_out_k, so_k);
55
- scan_out_1 = []; ...; scan_out_k = [];
73
+ }
56
- // identify number of iterations:
57
- N = (sequence_lengths specified) ? sequence_lengths[batch] : max_sequence_length;
58
- // execute loop
59
- for (int t = 0; t < N; ++t) {
60
- // generate the scan-input elements: the notation T<axis=k>[t] indicates the sub-tensor
61
- // of rank one less than T obtained by indexing T at position t along axis k.
62
- si_1 = (scan_1<axis=0>[batch])<axis=1>[t];
63
- ... ;
64
- si_m = (scan_m<axis=0>[batch])<axis=1>[t];
65
- // execute loop-body
66
- st_1, ..., st_n, so_1, ..., so_k = loop-body(st_1, ..., st_n, si_1, ..., si_m)
67
- // accumulate the scan-output elements
68
- scan_out_1 = Concat<axis=0>(scan_out_1, so_1); ... ; scan_out_k = Concat<axis=0>(scan_out_k, so_k);
69
- }
70
- // accumulate the outputs for this batch:
71
- bst_1[batch] = st_1; ..., bst_n[batch] = st_n;
72
- // Note scan-outputs will have size max_sequence_length, but only first N values will be meaningful.
73
- // The remaining values have an undefined value.
74
- b_scan_out_1[batch] = scan_out_1; ...; b_scan_out_k[batch] = scan_out_k;
75
- }
76
- return bst_1, ..., bst_n, b_scan_out_1, ..., b_scan_out_k;
74
+ return st_1, ..., st_n, scan_out_1, ..., scan_out_k;
77
75
  *Sample usage: Encoding RNN using a Scan*
78
76
  The following example shows how a simple RNN over an input tensor %X, with weight tensor %Wi,
79
77
  recurrence weight tensor %Ri, bias tensors %Wbi and %Rbi, and initial hidden-state %H_0 can
80
78
  be encoded as a ScanLoop. Note that the loop-body is a nested graph, and it directly computes
81
79
  %Wi, %Ri, %Wbi, and %Rbi (typically constants or initializers in the body graph). If these
82
80
  values are computed in the outer graph, they need to be passed in as extra state_variables.
83
81
  graph rnn-encoding {
84
82
  %H_0 = ...
85
83
  %X = ...
86
- %Y_h, %Y = Scan[body = <graph rnn-cell-1>, num_scan_inputs=1]("", %H_0, %X)
84
+ %Y_h, %Y = Scan[body = <graph rnn-cell-1>, num_scan_inputs=1](%H_0, %X)
87
85
  return %Y, %Y_h
88
86
  }
89
87
  graph rnn-cell-1 (
90
88
  %H_tminus1[FLOAT, tensor]
91
89
  %X_t[FLOAT, tensor]
92
90
  ) {
93
91
  %Wi = ...
94
92
  %Ri = ...
95
93
  %Wbi = ...
96
94
  %Rbi = ...
97
95
  %t1 = X_t * (Wi^T)
98
96
  %t2 = H_tminus1*(Ri^T)
99
97
  %t3 = Add(%t1, %t2)
100
98
  %t4 = Add(%t3, %Wbi)
101
99
  %t5 = Add(%t4, %Rbi)
102
100
  %Ht = Tanh(%t5)
103
101
  %Accumulate = Identity(%Ht)
104
102
  return %Ht, %Accumulate
105
103
  }
106
104
  ### Attributes
107
105
  * **body - GRAPH** (required) :
108
106
  The graph run each iteration. It has N+M inputs: (loop state variables..., scan_input_elts...). It has N+K outputs: (loop state variables..., scan_output_elts...). Each scan_output is created by concatenating the value of the specified scan_output_elt value at the end of each iteration of the loop. It is an error if the dimensions of these values change across loop iterations.
109
- * **directions - INTS** :
110
-
111
- An optional list of M flags. The i-th element of the list specifies the direction to be scanned for the i-th scan_input tensor: 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.
112
-
113
107
  * **num_scan_inputs - INT** (required) :
114
108
  An attribute specifying the number of scan_inputs M.
109
+ * **scan_input_axes - INTS** :
110
+
111
+ An optional list of M flags. The i-th element of the list specifies the axis to be scanned (the sequence axis) for the i-th scan_input. If omitted, 0 will be used as the scan axis for every scan_input.
112
+
113
+ * **scan_input_directions - INTS** :
114
+
115
+ An optional list of M flags. The i-th element of the list specifies the direction to be scanned for the i-th scan_input tensor: 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.
116
+
117
+ * **scan_output_axes - INTS** :
118
+
119
+ An optional list of K flags. The i-th element of the list specifies the axis for the i-th scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output.
120
+
121
+ * **scan_output_directions - INTS** :
122
+
123
+ An optional list of K flags, one for each scan_output. The i-th element of the list specifies whether the i-th scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.
124
+
115
125
  ### Inputs
116
- Between 2 and 2147483647 inputs.
126
+ Between 1 and 2147483647 inputs.
117
- - **sequence_lens** (optional, heterogeneous) - **I**:
118
-
119
- Optional tensor specifying lengths of the sequences in a batch. If this input is not specified, all sequences are assumed to be of the maximum sequence length (the dimension of the sequence axis of the scan_input tensors).
120
127
  - **initial_state_and_scan_inputs** (variadic) - **V**:
121
128
  Initial values of the loop's N state variables followed by M scan_inputs
122
129
  ### Outputs
123
130
  Between 1 and 2147483647 outputs.
124
131
  - **final_state_and_scan_outputs** (variadic) - **V**:
125
132
  Final values of the loop's N state variables followed by K scan_outputs
126
133
  ### Type Constraints
127
- * **I** in ( tensor(int64) ):
128
-
129
- Int64 tensor
130
134
  * **V** in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ):
131
135
  All Tensor types