ai.onnx.ml - LabelEncoder¶

LabelEncoder - 4 (ai.onnx.ml)¶

Version¶

name: LabelEncoder (GitHub)
domain: ai.onnx.ml
since_version: 4
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 4 of domain ai.onnx.ml.

Summary¶

Maps each element in the input tensor to another value.
The mapping is determined by the two parallel attributes, ‘keys_’ and ‘values_’ attribute. The i-th value in the specified ‘keys_’ attribute would be mapped to the i-th value in the specified ‘values_’ attribute. It implies that input’s element type and the element type of the specified ‘keys_’ should be identical while the output type is identical to the specified ‘values_’ attribute. Note that the ‘keys_’ and ‘values_’ attributes must have the same length. If an input element can not be found in the specified ‘keys_’ attribute, the ‘default_’ that matches the specified ‘values_’ attribute may be used as its output value. The type of the ‘default_’ attribute must match the ‘values_’ attribute chosen.
Let’s consider an example which maps a string tensor to an integer tensor. Assume and ‘keys_strings’ is [“Amy”, “Sally”], ‘values_int64s’ is [5, 6], and ‘default_int64’ is ‘-1’. The input [“Dori”, “Amy”, “Amy”, “Sally”, “Sally”] would be mapped to [-1, 5, 5, 6, 6].
Since this operator is an one-to-one mapping, its input and output shapes are the same. Notice that only one of ‘keys_’/‘values_*’ can be set.
Float keys with value ‘NaN’ match any input ‘NaN’ value regardless of bit value. If a key is repeated, the last key takes precedence.

Attributes¶

default_float - FLOAT (default is '-0.0'):

A float.
default_int64 - INT (default is '-1'):

An integer.
default_string - STRING (default is '_Unused'):

A string.
default_tensor - TENSOR :

A default tensor. {”Unused”} if values* has string type, {-1} if values_* has integral type, and {-0.f} if values_* has float type.
keys_floats - FLOATS :

A list of floats.
keys_int64s - INTS :

A list of ints.
keys_strings - STRINGS :

A list of strings.
keys_tensor - TENSOR :

Keys encoded as a 1D tensor. One and only one of ‘keys_*’s should be set.
values_floats - FLOATS :

A list of floats.
values_int64s - INTS :

A list of ints.
values_strings - STRINGS :

A list of strings.
values_tensor - TENSOR :

Values encoded as a 1D tensor. One and only one of ‘values_*’s should be set.

Inputs¶

X (heterogeneous) - T1:

Input data. It must have the same element type as the keys_* attribute set.

Outputs¶

Y (heterogeneous) - T2:

Output data. This tensor’s element type is based on the values_* attribute set.

Type Constraints¶

T1 in ( tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(string) ):

The input type is a tensor of any shape.
T2 in ( tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(string) ):

Output type is determined by the specified ‘values_*’ attribute.

Examples¶

_string_int_label_encoder¶

import numpy as np
import onnx

node = onnx.helper.make_node(
    "LabelEncoder",
    inputs=["X"],
    outputs=["Y"],
    domain="ai.onnx.ml",
    keys_strings=["a", "b", "c"],
    values_int64s=[0, 1, 2],
    default_int64=42,
)
x = np.array(["a", "b", "d", "c", "g"]).astype(object)
y = np.array([0, 1, 42, 2, 42]).astype(np.int64)
expect(
    node,
    inputs=[x],
    outputs=[y],
    name="test_ai_onnx_ml_label_encoder_string_int",
)

node = onnx.helper.make_node(
    "LabelEncoder",
    inputs=["X"],
    outputs=["Y"],
    domain="ai.onnx.ml",
    keys_strings=["a", "b", "c"],
    values_int64s=[0, 1, 2],
)
x = np.array(["a", "b", "d", "c", "g"]).astype(object)
y = np.array([0, 1, -1, 2, -1]).astype(np.int64)
expect(
    node,
    inputs=[x],
    outputs=[y],
    name="test_ai_onnx_ml_label_encoder_string_int_no_default",
)

_tensor_based_label_encoder¶

import numpy as np
import onnx

tensor_keys = make_tensor(
    "keys_tensor", onnx.TensorProto.STRING, (3,), ["a", "b", "c"]
)
repeated_string_keys = ["a", "b", "c"]
x = np.array(["a", "b", "d", "c", "g"]).astype(object)
y = np.array([0, 1, 42, 2, 42]).astype(np.int16)

node = onnx.helper.make_node(
    "LabelEncoder",
    inputs=["X"],
    outputs=["Y"],
    domain="ai.onnx.ml",
    keys_tensor=tensor_keys,
    values_tensor=make_tensor(
        "values_tensor", onnx.TensorProto.INT16, (3,), [0, 1, 2]
    ),
    default_tensor=make_tensor(
        "default_tensor", onnx.TensorProto.INT16, (1,), [42]
    ),
)

expect(
    node,
    inputs=[x],
    outputs=[y],
    name="test_ai_onnx_ml_label_encoder_tensor_mapping",
)

node = onnx.helper.make_node(
    "LabelEncoder",
    inputs=["X"],
    outputs=["Y"],
    domain="ai.onnx.ml",
    keys_strings=repeated_string_keys,
    values_tensor=make_tensor(
        "values_tensor", onnx.TensorProto.INT16, (3,), [0, 1, 2]
    ),
    default_tensor=make_tensor(
        "default_tensor", onnx.TensorProto.INT16, (1,), [42]
    ),
)

expect(
    node,
    inputs=[x],
    outputs=[y],
    name="test_ai_onnx_ml_label_encoder_tensor_value_only_mapping",
)

LabelEncoder - 2 vs 4

LabelEncoder - 2 (ai.onnx.ml)¶

Version¶

name: LabelEncoder (GitHub)
domain: ai.onnx.ml
since_version: 2
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 2 of domain ai.onnx.ml.

Summary¶

Maps each element in the input tensor to another value.
The mapping is determined by the two parallel attributes, ‘keys_’ and ‘values_’ attribute. The i-th value in the specified ‘keys_’ attribute would be mapped to the i-th value in the specified ‘values_’ attribute. It implies that input’s element type and the element type of the specified ‘keys_’ should be identical while the output type is identical to the specified ‘values_’ attribute. If an input element can not be found in the specified ‘keys_’ attribute, the ‘default_’ that matches the specified ‘values_’ attribute may be used as its output value.
Let’s consider an example which maps a string tensor to an integer tensor. Assume and ‘keys_strings’ is [“Amy”, “Sally”], ‘values_int64s’ is [5, 6], and ‘default_int64’ is ‘-1’. The input [“Dori”, “Amy”, “Amy”, “Sally”, “Sally”] would be mapped to [-1, 5, 5, 6, 6].
Since this operator is an one-to-one mapping, its input and output shapes are the same. Notice that only one of ‘keys_’/‘values_’ can be set.
For key look-up, bit-wise comparison is used so even a float NaN can be mapped to a value in ‘values_’ attribute.

Attributes¶

default_float - FLOAT (default is '-0.0'):

A float.
default_int64 - INT (default is '-1'):

An integer.
default_string - STRING (default is '_Unused'):

A string.
keys_floats - FLOATS :

A list of floats.
keys_int64s - INTS :

A list of ints.
keys_strings - STRINGS :

A list of strings. One and only one of ‘keys_*’s should be set.
values_floats - FLOATS :

A list of floats.
values_int64s - INTS :

A list of ints.
values_strings - STRINGS :

A list of strings. One and only one of ‘value_*’s should be set.

Inputs¶

X (heterogeneous) - T1:

Input data. It can be either tensor or scalar.

Outputs¶

Y (heterogeneous) - T2:

Output data.

Type Constraints¶

T1 in ( tensor(float), tensor(int64), tensor(string) ):

The input type is a tensor of any shape.
T2 in ( tensor(float), tensor(int64), tensor(string) ):

Output type is determined by the specified ‘values_*’ attribute.

LabelEncoder - 1 (ai.onnx.ml)¶

Version¶

name: LabelEncoder (GitHub)
domain: ai.onnx.ml
since_version: 1
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary¶

Converts strings to integers and vice versa.
If the string default value is set, it will convert integers to strings. If the int default value is set, it will convert strings to integers.
Each operator converts either integers to strings or strings to integers, depending on which default value attribute is provided. Only one default value attribute should be defined.
When converting from integers to strings, the string is fetched from the ‘classes_strings’ list, by simple indexing.
When converting from strings to integers, the string is looked up in the list and the index at which it is found is used as the converted value.

Attributes¶

classes_strings - STRINGS :

A list of labels.
default_int64 - INT (default is '-1'):

An integer to use when an input string value is not found in the map.
One and only one of the ‘default_*’ attributes must be defined.
default_string - STRING (default is '_Unused'):

A string to use when an input integer value is not found in the map.
One and only one of the ‘default_*’ attributes must be defined.

Inputs¶

X (heterogeneous) - T1:

Input data.

Outputs¶

Y (heterogeneous) - T2:

Output data. If strings are input, the output values are integers, and vice versa.

Type Constraints¶

T1 in ( tensor(int64), tensor(string) ):

The input type must be a tensor of integers or strings, of any shape.
T2 in ( tensor(int64), tensor(string) ):

The output type will be a tensor of strings or integers, and will have the same shape as the input.