ai.onnx.ml - LabelEncoder

LabelEncoder - 4 (ai.onnx.ml)

Version

  • name: LabelEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 4

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 4 of domain ai.onnx.ml.

Summary

Maps each element in the input tensor to another value.<br> The mapping is determined by the two parallel attributes, ‘keys_’ and ‘values_’ attribute. The i-th value in the specified ‘keys_’ attribute would be mapped to the i-th value in the specified ‘values_’ attribute. It implies that input’s element type and the element type of the specified ‘keys_’ should be identical while the output type is identical to the specified ‘values_’ attribute. Note that the ‘keys_’ and ‘values_’ attributes must have the same length. If an input element can not be found in the specified ‘keys_’ attribute, the ‘default_’ that matches the specified ‘values_’ attribute may be used as its output value. The type of the ‘default_’ attribute must match the ‘values_’ attribute chosen. <br> Let’s consider an example which maps a string tensor to an integer tensor. Assume and ‘keys_strings’ is [“Amy”, “Sally”], ‘values_int64s’ is [5, 6], and ‘default_int64’ is ‘-1’. The input [“Dori”, “Amy”, “Amy”, “Sally”, “Sally”] would be mapped to [-1, 5, 5, 6, 6].<br> Since this operator is an one-to-one mapping, its input and output shapes are the same. Notice that only one of ‘keys_’/’values_*’ can be set.<br> Float keys with value ‘NaN’ match any input ‘NaN’ value regardless of bit value. If a key is repeated, the last key takes precedence.

Attributes

  • default_float - FLOAT (default is &#39;-0.0&#39;):

    A float.

  • default_int64 - INT (default is &#39;-1&#39;):

    An integer.

  • default_string - STRING (default is &#39;_Unused&#39;):

    A string.

  • default_tensor - TENSOR :

    A default tensor. {”Unused”} if values* has string type, {-1} if values_* has integral type, and {-0.f} if values_* has float type.

  • keys_floats - FLOATS :

    A list of floats.

  • keys_int64s - INTS :

    A list of ints.

  • keys_strings - STRINGS :

    A list of strings.

  • keys_tensor - TENSOR :

    Keys encoded as a 1D tensor. One and only one of ‘keys_*’s should be set.

  • values_floats - FLOATS :

    A list of floats.

  • values_int64s - INTS :

    A list of ints.

  • values_strings - STRINGS :

    A list of strings.

  • values_tensor - TENSOR :

    Values encoded as a 1D tensor. One and only one of ‘values_*’s should be set.

Inputs

  • X (heterogeneous) - T1:

    Input data. It must have the same element type as the keys_* attribute set.

Outputs

  • Y (heterogeneous) - T2:

    Output data. This tensor’s element type is based on the values_* attribute set.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(string) ):

    The input type is a tensor of any shape.

  • T2 in ( tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(string) ):

    Output type is determined by the specified ‘values_*’ attribute.

Examples

_string_int_label_encoder

import numpy as np
import onnx

node = onnx.helper.make_node(
    &#34;LabelEncoder&#34;,
    inputs=[&#34;X&#34;],
    outputs=[&#34;Y&#34;],
    domain=&#34;ai.onnx.ml&#34;,
    keys_strings=[&#34;a&#34;, &#34;b&#34;, &#34;c&#34;],
    values_int64s=[0, 1, 2],
    default_int64=42,
)
x = np.array([&#34;a&#34;, &#34;b&#34;, &#34;d&#34;, &#34;c&#34;, &#34;g&#34;]).astype(object)
y = np.array([0, 1, 42, 2, 42]).astype(np.int64)
expect(
    node,
    inputs=[x],
    outputs=[y],
    name=&#34;test_ai_onnx_ml_label_encoder_string_int&#34;,
)

node = onnx.helper.make_node(
    &#34;LabelEncoder&#34;,
    inputs=[&#34;X&#34;],
    outputs=[&#34;Y&#34;],
    domain=&#34;ai.onnx.ml&#34;,
    keys_strings=[&#34;a&#34;, &#34;b&#34;, &#34;c&#34;],
    values_int64s=[0, 1, 2],
)
x = np.array([&#34;a&#34;, &#34;b&#34;, &#34;d&#34;, &#34;c&#34;, &#34;g&#34;]).astype(object)
y = np.array([0, 1, -1, 2, -1]).astype(np.int64)
expect(
    node,
    inputs=[x],
    outputs=[y],
    name=&#34;test_ai_onnx_ml_label_encoder_string_int_no_default&#34;,
)

_tensor_based_label_encoder

import numpy as np
import onnx

tensor_keys = make_tensor(
    &#34;keys_tensor&#34;, onnx.TensorProto.STRING, (3,), [&#34;a&#34;, &#34;b&#34;, &#34;c&#34;]
)
repeated_string_keys = [&#34;a&#34;, &#34;b&#34;, &#34;c&#34;]
x = np.array([&#34;a&#34;, &#34;b&#34;, &#34;d&#34;, &#34;c&#34;, &#34;g&#34;]).astype(object)
y = np.array([0, 1, 42, 2, 42]).astype(np.int16)

node = onnx.helper.make_node(
    &#34;LabelEncoder&#34;,
    inputs=[&#34;X&#34;],
    outputs=[&#34;Y&#34;],
    domain=&#34;ai.onnx.ml&#34;,
    keys_tensor=tensor_keys,
    values_tensor=make_tensor(
        &#34;values_tensor&#34;, onnx.TensorProto.INT16, (3,), [0, 1, 2]
    ),
    default_tensor=make_tensor(
        &#34;default_tensor&#34;, onnx.TensorProto.INT16, (1,), [42]
    ),
)

expect(
    node,
    inputs=[x],
    outputs=[y],
    name=&#34;test_ai_onnx_ml_label_encoder_tensor_mapping&#34;,
)

node = onnx.helper.make_node(
    &#34;LabelEncoder&#34;,
    inputs=[&#34;X&#34;],
    outputs=[&#34;Y&#34;],
    domain=&#34;ai.onnx.ml&#34;,
    keys_strings=repeated_string_keys,
    values_tensor=make_tensor(
        &#34;values_tensor&#34;, onnx.TensorProto.INT16, (3,), [0, 1, 2]
    ),
    default_tensor=make_tensor(
        &#34;default_tensor&#34;, onnx.TensorProto.INT16, (1,), [42]
    ),
)

expect(
    node,
    inputs=[x],
    outputs=[y],
    name=&#34;test_ai_onnx_ml_label_encoder_tensor_value_only_mapping&#34;,
)

LabelEncoder - 2 (ai.onnx.ml)

Version

  • name: LabelEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2 of domain ai.onnx.ml.

Summary

Maps each element in the input tensor to another value.<br> The mapping is determined by the two parallel attributes, ‘keys_’ and ‘values_’ attribute. The i-th value in the specified ‘keys_’ attribute would be mapped to the i-th value in the specified ‘values_’ attribute. It implies that input’s element type and the element type of the specified ‘keys_’ should be identical while the output type is identical to the specified ‘values_’ attribute. If an input element can not be found in the specified ‘keys_’ attribute, the ‘default_’ that matches the specified ‘values_’ attribute may be used as its output value.<br> Let’s consider an example which maps a string tensor to an integer tensor. Assume and ‘keys_strings’ is [“Amy”, “Sally”], ‘values_int64s’ is [5, 6], and ‘default_int64’ is ‘-1’. The input [“Dori”, “Amy”, “Amy”, “Sally”, “Sally”] would be mapped to [-1, 5, 5, 6, 6].<br> Since this operator is an one-to-one mapping, its input and output shapes are the same. Notice that only one of ‘keys_’/’values_’ can be set.<br> For key look-up, bit-wise comparison is used so even a float NaN can be mapped to a value in ‘values_’ attribute.<br>

Attributes

  • default_float - FLOAT (default is &#39;-0.0&#39;):

    A float.

  • default_int64 - INT (default is &#39;-1&#39;):

    An integer.

  • default_string - STRING (default is &#39;_Unused&#39;):

    A string.

  • keys_floats - FLOATS :

    A list of floats.

  • keys_int64s - INTS :

    A list of ints.

  • keys_strings - STRINGS :

    A list of strings. One and only one of ‘keys_*’s should be set.

  • values_floats - FLOATS :

    A list of floats.

  • values_int64s - INTS :

    A list of ints.

  • values_strings - STRINGS :

    A list of strings. One and only one of ‘value_*’s should be set.

Inputs

  • X (heterogeneous) - T1:

    Input data. It can be either tensor or scalar.

Outputs

  • Y (heterogeneous) - T2:

    Output data.

Type Constraints

  • T1 in ( tensor(float), tensor(int64), tensor(string) ):

    The input type is a tensor of any shape.

  • T2 in ( tensor(float), tensor(int64), tensor(string) ):

    Output type is determined by the specified ‘values_*’ attribute.

LabelEncoder - 1 (ai.onnx.ml)

Version

  • name: LabelEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Converts strings to integers and vice versa.<br> If the string default value is set, it will convert integers to strings. If the int default value is set, it will convert strings to integers.<br> Each operator converts either integers to strings or strings to integers, depending on which default value attribute is provided. Only one default value attribute should be defined.<br> When converting from integers to strings, the string is fetched from the ‘classes_strings’ list, by simple indexing.<br> When converting from strings to integers, the string is looked up in the list and the index at which it is found is used as the converted value.

Attributes

  • classes_strings - STRINGS :

    A list of labels.

  • default_int64 - INT (default is &#39;-1&#39;):

    An integer to use when an input string value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined.

  • default_string - STRING (default is &#39;_Unused&#39;):

    A string to use when an input integer value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined.

Inputs

  • X (heterogeneous) - T1:

    Input data.

Outputs

  • Y (heterogeneous) - T2:

    Output data. If strings are input, the output values are integers, and vice versa.

Type Constraints

  • T1 in ( tensor(int64), tensor(string) ):

    The input type must be a tensor of integers or strings, of any shape.

  • T2 in ( tensor(int64), tensor(string) ):

    The output type will be a tensor of strings or integers, and will have the same shape as the input.