Cast - 6 vs 21¶
Next section compares an older to a newer version of the same operator after both definition are converted into markdown text. Green means an addition to the newer version, red means a deletion. Anything else is unchanged.
- Cast6 → Cast21 +68 -5
Cast6 → Cast21
RENAMED
@@ -1 +1 @@
|
|
1
1
|
The operator casts the elements of a given input tensor to a data type
|
2
2
|
specified by the 'to' argument and returns an output tensor of the same size in
|
3
3
|
the converted type. The 'to' argument must be one of the data types specified
|
4
4
|
in the 'DataType' enum field in the TensorProto message.
|
5
|
+
|
6
|
+
Casting from string tensor in plain (e.g., "3.14" and "1000") and scientific numeric representations
|
7
|
+
(e.g., "1e-5" and "1E8") to float types is supported. For example, converting string "100.5" to an integer may
|
8
|
+
yield result 100. There are some string literals reserved for special floating-point values;
|
9
|
+
"+INF" (and "INF"), "-INF", and "NaN" are positive infinity, negative infinity, and not-a-number, respectively.
|
10
|
+
Any string which can exactly match "+INF" in a case-insensitive way would be mapped to positive infinite. Similarly,
|
11
|
+
this case-insensitive rule is applied to "INF" and "NaN". When casting from numeric tensors
|
12
|
+
to string tensors, plain floating-point representation (such as "314.15926") would be used.
|
13
|
+
Converting non-numerical-literal string such as "Hello World!" is an undefined behavior. Cases
|
14
|
+
of converting string representing floating-point arithmetic value, such as "2.718", to INT is an undefined behavior.
|
15
|
+
|
16
|
+
Conversion from a numerical type to any numerical type is always allowed.
|
17
|
+
User must be aware of precision loss and value change caused by range difference between two types.
|
18
|
+
For example, a 64-bit float 3.1415926459 may be round to a 32-bit float 3.141592. Similarly, converting
|
19
|
+
an integer 36 to Boolean may produce 1 because we truncate bits which can't be stored in the targeted type.
|
20
|
+
|
21
|
+
In more detail, the conversion among numerical types should follow these rules
|
5
|
-
|
22
|
+
if the destination type is not a float 8 type.
|
23
|
+
|
24
|
+
* Casting from floating point to:
|
25
|
+
* floating point: +/- infinity if OOR (out of range).
|
26
|
+
* fixed point: undefined if OOR.
|
27
|
+
* bool: +/- 0.0 to False; all else to True.
|
28
|
+
* Casting from fixed point to:
|
29
|
+
* floating point: +/- infinity if OOR. (+ infinity in the case of uint)
|
30
|
+
* fixed point: when OOR, discard higher bits and reinterpret (with respect to two's complement representation for
|
31
|
+
signed types). For example, 200 (int16) -> -56 (int8).
|
32
|
+
* bool: zero to False; nonzero to True.
|
33
|
+
* Casting from bool to:
|
34
|
+
* floating point: {1.0, 0.0}.
|
35
|
+
* fixed point: {1, 0}.
|
36
|
+
* bool: no change.
|
37
|
+
|
38
|
+
Float 8 type were introduced to speed up the training of
|
39
|
+
deep models. By default the conversion of a float *x* obeys
|
40
|
+
to the following rules. [x] means the value rounded to
|
41
|
+
the target mantissa width.
|
42
|
+
|
43
|
+
| x | E4M3FN | E4M3FNUZ | E5M2 | E5M2FNUZ |
|
44
|
+
|------|----|----|----|----|
|
45
|
+
| 0 | 0 | 0 | 0 | 0 |
|
46
|
+
|-0 | -0 | 0 | -0 | 0 |
|
47
|
+
| NaN | NaN | NaN | NaN | NaN |
|
48
|
+
| +/- Inf | +/- FLT_MAX | NaN | FLT_MAX | NaN |
|
49
|
+
| [x] > FLT_MAX | FLT_MAX | FLT_MAX | FLT_MAX | FLT_MAX |
|
50
|
+
| [x] < -FLT_MAX | -FLT_MAX | -FLT_MAX | -FLT_MAX | -FLT_MAX |
|
51
|
+
| else | RNE | RNE | RNE | RNE |
|
52
|
+
|
53
|
+
The behavior changes if the parameter 'saturate' is set to False.
|
54
|
+
The rules then become:
|
55
|
+
|
56
|
+
| x | E4M3FN | E4M3FNUZ | E5M2 | E5M2FNUZ |
|
57
|
+
|------|----|----|----|----|
|
58
|
+
| 0 | 0 | 0 | 0 | 0 |
|
59
|
+
|-0 | -0 | 0 | -0 | 0 |
|
60
|
+
| NaN | NaN | NaN | NaN | NaN |
|
61
|
+
| +/- Inf | NaN | NaN | +/- Inf | NaN |
|
62
|
+
| [x] > FLT_MAX | NaN | NaN | Inf | NaN |
|
63
|
+
| [x] < -FLT_MAX | NaN | NaN | -Inf | NaN |
|
64
|
+
| else | RNE | RNE | RNE | RNE |
|
6
65
|
### Attributes
|
66
|
+
|
67
|
+
* **saturate - INT** (default is '1'):
|
68
|
+
|
69
|
+
The parameter defines how the conversion behaves if an input value is out of range of the destination type. It only applies for float 8 conversion (float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz). It is true by default. All cases are fully described in two tables inserted in the operator description.
|
7
70
|
* **to - INT** (required) :
|
8
71
|
The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto
|
9
72
|
### Inputs
|
10
73
|
- **input** (heterogeneous) - **T1**:
|
11
74
|
Input tensor to be cast.
|
12
75
|
### Outputs
|
13
76
|
- **output** (heterogeneous) - **T2**:
|
14
77
|
Output tensor with the same shape as input with type specified by the 'to' argument
|
15
78
|
### Type Constraints
|
16
|
-
* **T1** in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ):
|
79
|
+
* **T1** in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint4), tensor(uint64), tensor(uint8) ):
|
17
|
-
Constrain input types. Casting from
|
80
|
+
Constrain input types. Casting from complex is not supported.
|
18
|
-
* **T2** in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ):
|
81
|
+
* **T2** in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint4), tensor(uint64), tensor(uint8) ):
|
19
|
-
Constrain output types. Casting to
|
82
|
+
Constrain output types. Casting to complex is not supported.? ^^
|