Cast - 6 vs 21

Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.

Files changed (1) hide show
  1. Cast6 → Cast21 +68 -5
Cast6 → Cast21 RENAMED
@@ -1 +1 @@
1
1
  The operator casts the elements of a given input tensor to a data type
2
2
  specified by the 'to' argument and returns an output tensor of the same size in
3
3
  the converted type. The 'to' argument must be one of the data types specified
4
4
  in the 'DataType' enum field in the TensorProto message.
5
+
6
+ Casting from string tensor in plain (e.g., "3.14" and "1000") and scientific numeric representations
7
+ (e.g., "1e-5" and "1E8") to float types is supported. For example, converting string "100.5" to an integer may
8
+ yield result 100. There are some string literals reserved for special floating-point values;
9
+ "+INF" (and "INF"), "-INF", and "NaN" are positive infinity, negative infinity, and not-a-number, respectively.
10
+ Any string which can exactly match "+INF" in a case-insensitive way would be mapped to positive infinite. Similarly,
11
+ this case-insensitive rule is applied to "INF" and "NaN". When casting from numeric tensors
12
+ to string tensors, plain floating-point representation (such as "314.15926") would be used.
13
+ Converting non-numerical-literal string such as "Hello World!" is an undefined behavior. Cases
14
+ of converting string representing floating-point arithmetic value, such as "2.718", to INT is an undefined behavior.
15
+
16
+ Conversion from a numerical type to any numerical type is always allowed.
17
+ User must be aware of precision loss and value change caused by range difference between two types.
18
+ For example, a 64-bit float 3.1415926459 may be round to a 32-bit float 3.141592. Similarly, converting
19
+ an integer 36 to Boolean may produce 1 because we truncate bits which can't be stored in the targeted type.
20
+
21
+ In more detail, the conversion among numerical types should follow these rules
5
- NOTE: Casting to and from strings is not supported yet.
22
+ if the destination type is not a float 8 type.
23
+
24
+ * Casting from floating point to:
25
+ * floating point: +/- infinity if OOR (out of range).
26
+ * fixed point: undefined if OOR.
27
+ * bool: +/- 0.0 to False; all else to True.
28
+ * Casting from fixed point to:
29
+ * floating point: +/- infinity if OOR. (+ infinity in the case of uint)
30
+ * fixed point: when OOR, discard higher bits and reinterpret (with respect to two's complement representation for
31
+ signed types). For example, 200 (int16) -> -56 (int8).
32
+ * bool: zero to False; nonzero to True.
33
+ * Casting from bool to:
34
+ * floating point: {1.0, 0.0}.
35
+ * fixed point: {1, 0}.
36
+ * bool: no change.
37
+
38
+ Float 8 type were introduced to speed up the training of
39
+ deep models. By default the conversion of a float *x* obeys
40
+ to the following rules. [x] means the value rounded to
41
+ the target mantissa width.
42
+
43
+ | x | E4M3FN | E4M3FNUZ | E5M2 | E5M2FNUZ |
44
+ |------|----|----|----|----|
45
+ | 0 | 0 | 0 | 0 | 0 |
46
+ |-0 | -0 | 0 | -0 | 0 |
47
+ | NaN | NaN | NaN | NaN | NaN |
48
+ | +/- Inf | +/- FLT_MAX | NaN | FLT_MAX | NaN |
49
+ | [x] > FLT_MAX | FLT_MAX | FLT_MAX | FLT_MAX | FLT_MAX |
50
+ | [x] < -FLT_MAX | -FLT_MAX | -FLT_MAX | -FLT_MAX | -FLT_MAX |
51
+ | else | RNE | RNE | RNE | RNE |
52
+
53
+ The behavior changes if the parameter 'saturate' is set to False.
54
+ The rules then become:
55
+
56
+ | x | E4M3FN | E4M3FNUZ | E5M2 | E5M2FNUZ |
57
+ |------|----|----|----|----|
58
+ | 0 | 0 | 0 | 0 | 0 |
59
+ |-0 | -0 | 0 | -0 | 0 |
60
+ | NaN | NaN | NaN | NaN | NaN |
61
+ | +/- Inf | NaN | NaN | +/- Inf | NaN |
62
+ | [x] > FLT_MAX | NaN | NaN | Inf | NaN |
63
+ | [x] < -FLT_MAX | NaN | NaN | -Inf | NaN |
64
+ | else | RNE | RNE | RNE | RNE |
6
65
  ### Attributes
66
+
67
+ * **saturate - INT** (default is '1'):
68
+
69
+ The parameter defines how the conversion behaves if an input value is out of range of the destination type. It only applies for float 8 conversion (float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz). It is true by default. All cases are fully described in two tables inserted in the operator description.
7
70
  * **to - INT** (required) :
8
71
  The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto
9
72
  ### Inputs
10
73
  - **input** (heterogeneous) - **T1**:
11
74
  Input tensor to be cast.
12
75
  ### Outputs
13
76
  - **output** (heterogeneous) - **T2**:
14
77
  Output tensor with the same shape as input with type specified by the 'to' argument
15
78
  ### Type Constraints
16
- * **T1** in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ):
79
+ * **T1** in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint4), tensor(uint64), tensor(uint8) ):
17
- Constrain input types. Casting from strings and complex are not supported.
80
+ Constrain input types. Casting from complex is not supported.
18
- * **T2** in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ):
81
+ * **T2** in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint4), tensor(uint64), tensor(uint8) ):
19
- Constrain output types. Casting to strings and complex are not supported.? ------------ ^^^
82
+ Constrain output types. Casting to complex is not supported.? ^^