API Summary#

Summary of public functions and classes exposed in scikit-onnx.

Version#

skl2onnx.get_latest_tested_opset_version()[source]#

This module relies on onnxruntime to test every converter. The function returns the most recent target opset tested with onnxruntime or the opset version specified by onnx package if this one is lower (return by onnx.defs.onnx_opset_version()).

Converters#

Both functions convert a scikit-learn model into ONNX. The first one lets the user manually define the input’s name and types. The second one infers this information from the training data. These two functions are the main entry points to converter. The rest of the API is needed if a model has no converter implemented in this package. A new converter has then to be registered, whether it is imported from another package or created from scratch.

skl2onnx.convert_sklearn(model, name=None, initial_types=None, doc_string='', target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, options=None, intermediate=False, white_op=None, black_op=None, final_types=None, dtype=None, naming=None, model_optim=True, verbose=0)[source]#

This function produces an equivalent ONNX model of the given scikit-learn model. The supported converters is returned by function supported_converters.

For pipeline conversion, user needs to make sure each component is one of our supported items. This function converts the specified scikit-learn model into its ONNX counterpart. Note that for all conversions, initial types are required. ONNX model name can also be specified.

Parameters:
  • model – A scikit-learn model

  • initial_types – a python list. Each element is a tuple of a variable name and a type defined in data_types.py

  • name – The name of the graph (type: GraphProto) in the produced ONNX model (type: ModelProto)

  • doc_string – A string attached onto the produced ONNX model

  • target_opset – number, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3, if value is not specified, the function will choose the latest tested opset (see skl2onnx.get_latest_tested_opset_version())

  • custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters

  • custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.

  • custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • options – specific options given to converters (see Converters with options)

  • intermediate – if True, the function returns the converted model and the instance of Topology used, it returns the converted model otherwise

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • dtype – removed in version 1.7.5, dtype is now inferred from input types, converters may add operators Cast to switch to double when it is necessary

  • naming – the user may want to change the way intermediate are named, this parameter can be a string (a prefix) or a function, which signature is the following: get_name(name, existing_names), the library will then check this name is unique and modify it if not

  • model_optim – enable or disable model optimisation after the model was converted into onnx, it reduces the number of identity nodes

  • verbose – display progress while converting a model

Returns:

An ONNX model (type: ModelProto) which is equivalent to the input scikit-learn model

Example of initial_types: Assume that the specified scikit-learn model takes a heterogeneous list as its input. If the first 5 elements are floats and the last 10 elements are integers, we need to specify initial types as below. The [None] in [None, 5] indicates the batch size here is unknown.

from skl2onnx.common.data_types import FloatTensorType, Int64TensorType
initial_type = [('float_input', FloatTensorType([None, 5])),
                ('int64_input', Int64TensorType([None, 10]))]

Note

If a pipeline includes an instance of ColumnTransformer, scikit-learn allow the user to specify columns by names. This option is not supported by sklearn-onnx as features names could be different in input data and the ONNX graph (defined by parameter initial_types), only integers are supported.

Converters options#

Some ONNX operators exposes parameters sklearn-onnx cannot guess from the raw model. Some default values are usually suggested but the users may have to manually overwrite them. This need is not obvious to do when a model is included in a pipeline. That’s why these options can be given to function convert_sklearn as a dictionary {model_type: parameters in a dictionary} or {model_id: parameters in a dictionary}. Option sep is used to specify the delimiters between two words when the ONNX graph needs to tokenize a string. The default value is short and may not include all the necessary values. It can be overwritten as:

extra = {TfidfVectorizer: {"separators": [' ', '[.]', '\\?',
            ',', ';', ':', '\\!', '\\(', '\\)']}}
model_onnx = convert_sklearn(
    model, "tfidf",
    initial_types=[("input", StringTensorType([None, 1]))],
    options=extra)

But if a pipeline contains two model of the same class, it is possible to distinguish between the two with function id:

extra = {id(model): {"separators": [' ', '.', '\\?', ',', ';',
            ':', '\\!', '\\(', '\\)']}}
model_onnx = convert_sklearn(
    pipeline, "pipeline-with-2-tfidf",
    initial_types=[("input", StringTensorType([None, 1]))],
    options=extra)

It is used in example TfIdfVectorizer with ONNX.

Changed in version 1.10.0: Parameter naming was added.

skl2onnx.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, white_op=None, black_op=None, final_types=None, dtype=None, naming=None, model_optim=True, verbose=0)[source]#

Calls convert_sklearn() with simplified parameters.

Parameters:
  • model – model to convert

  • X – training set, can be None, it is used to infered the input types (initial_types)

  • initial_types – if X is None, then initial_types must be defined

  • target_opset – conversion with a specific target opset

  • options – specific options given to converters (see Converters with options)

  • name – name of the model

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • dtype – removed in version 1.7.5, dtype is now inferred from input types, converters may add operators Cast to switch to double when it is necessary

  • naming – the user may want to change the way intermediate are named, this parameter can be a string (a prefix) or a function, which signature is the following: get_name(name, existing_names), the library will then check this name is unique and modify it if not

  • model_optim – enable or disable model optimisation after the model was converted into onnx, it reduces the number of identity nodes

  • verbose – display progress while converting a model

Returns:

converted model

This function checks if the model inherits from class OnnxOperatorMixin, it calls method to_onnx in that case otherwise it calls convert_sklearn().

Changed in version 1.10.0: Parameter naming was added.

Logging#

The conversion of a pipeline fails if it contains an object without any associated converter. It may also fails if one of the object is mapped by a custom converter. If the error message is not explicit enough, it is possible to enable logging:

import logging
logger = logging.getLogger('skl2onnx')
logger.setLevel(logging.DEBUG)

Example Logging, verbose illustrates what it looks like.

Register a new converter#

If a model has no converter implemented in this package, a new converter has then to be registered, whether it is imported from another package or created from scratch. Section Covered Converters lists all available converters.

skl2onnx.supported_converters(from_sklearn=False)[source]#

Returns the list of supported converters. To find the converter associated to a specific model, the library gets the name of the model class, adds 'Sklearn' as a prefix and retrieves the associated converter if available.

Parameters:

from_sklearn – every supported model is mapped to converter by a name prefixed with 'Sklearn', the prefix is removed if this parameter is False but the function only returns converters whose name is prefixed by 'Sklearn'

Returns:

list of supported models as string

skl2onnx.update_registered_converter(model, alias, shape_fct, convert_fct, overwrite=True, parser=None, options=None)[source]#

Registers or updates a converter for a new model so that it can be converted when inserted in a scikit-learn pipeline.

Parameters:
  • model – model class

  • alias – alias used to register the model

  • shape_fct – function which checks or modifies the expected outputs, this function should be fast so that the whole graph can be computed followed by the conversion of each model, parallelized or not

  • convert_fct – function which converts a model

  • overwrite – False to raise exception if a converter already exists

  • parser – overwrites the parser as well if not empty

  • options – registered options for this converter

The alias is usually the library name followed by the model name. Example:

from skl2onnx.common.shape_calculator import calculate_linear_classifier_output_shapes
from skl2onnx.operator_converters.RandomForest import convert_sklearn_random_forest_classifier
from skl2onnx import update_registered_converter
update_registered_converter(
        SGDClassifier, 'SklearnLinearClassifier',
        calculate_linear_classifier_output_shapes,
        convert_sklearn_random_forest_classifier,
        options={'zipmap': [True, False, 'columns'],
                 'output_class_labels': [False, True],
                 'raw_scores': [True, False]})

The function does not update the parser if not specified except if option ‘zipmap’ is added to the list. Every classifier must declare this option to let the default parser automatically handle that option.

skl2onnx.update_registered_parser(model, parser_fct)[source]#

Registers or updates a parser for a new model. A parser returns the expected output of a model.

Parameters:
  • model – model class

  • parser_fct – parser, signature is the same as parse_sklearn

Helpers for new converters#

skl2onnx.helpers.add_onnx_graph(scope: Scope, operator: Operator, container: ModelComponentContainer, onx: ModelProto)[source]#

Adds a whole ONNX graph to an existing one following skl2onnx API assuming this ONNX graph implements an operator.

Parameters:
  • scope – scope (to get unique names)

  • operator – operator

  • container – container

  • onx – ONNX graph

Manipulate ONNX graphs#

skl2onnx.helpers.onnx_helper.enumerate_model_node_outputs(model, add_node=False)[source]#

Enumerates all the nodes of a model.

Parameters:
  • model – ONNX graph

  • add_node – if False, the function enumerates all output names from every node, otherwise, it enumerates tuple (output name, node)

Returns:

enumerator

skl2onnx.helpers.onnx_helper.load_onnx_model(onnx_file_or_bytes)[source]#

Loads an ONNX file.

Parameters:

onnx_file_or_bytesONNX file or bytes

Returns:

ONNX model

skl2onnx.helpers.onnx_helper.select_model_inputs_outputs(model, outputs=None, inputs=None)[source]#

Takes a model and changes its outputs.

Parameters:
  • modelONNX model

  • inputs – new inputs

  • outputs – new outputs

Returns:

modified model

The function removes unneeded files.

skl2onnx.helpers.onnx_helper.save_onnx_model(model, filename=None)[source]#

Saves a model as a file or bytes.

Parameters:
  • modelONNX model

  • filename – filename or None to return bytes

Returns:

bytes

Parsers#

skl2onnx._parse.parse_sklearn(scope, model, inputs, custom_parsers=None, final_types=None)[source]#

This is a delegate function. It does nothing but invokes the correct parsing function according to the input model’s type.

Parameters:
  • scope – Scope object

  • model – A scikit-learn object (e.g., OneHotEncoder and LogisticRegression)

  • inputs – A list of variables

  • custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

Returns:

The output variables produced by the input model

skl2onnx._parse.parse_sklearn_model(model, initial_types=None, target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, options=None, white_op=None, black_op=None, final_types=None, naming=None)[source]#

Puts scikit-learn object into an abstract container so that our framework can work seamlessly on models created with different machine learning tools.

Parameters:
  • model – A scikit-learn model

  • initial_types – a python list. Each element is a tuple of a variable name and a type defined in data_types.py

  • target_opset – number, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3.

  • custom_conversion_functions – a dictionary for specifying the user customized conversion function if not registered

  • custom_shape_calculators – a dictionary for specifying the user customized shape calculator if not registered

  • custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • options – specific options given to converters (see Converters with options)

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • naming – the user may want to change the way intermediate are named, this parameter can be a string (a prefix) or a function, which signature is the following: get_name(name, existing_names), the library will then check this name is unique and modify it if not

Returns:

Topology

Changed in version 1.10.0: Parameter naming was added.

Utils for contributors#

skl2onnx.common.utils.check_input_and_output_numbers(operator, input_count_range=None, output_count_range=None)[source]#

Check if the number of input(s)/output(s) is correct

Parameters:
  • operator – A Operator object

  • input_count_range – A list of two integers or an integer. If it’s a list the first/second element is the minimal/maximal number of inputs. If it’s an integer, it is equivalent to specify that number twice in a list. For infinite ranges like 5 to infinity, you need to use [5, None].

  • output_count_range – A list of two integers or an integer. See input_count_range for its format.

skl2onnx.common.utils.check_input_and_output_types(operator, good_input_types=None, good_output_types=None)[source]#

Check if the type(s) of input(s)/output(s) is(are) correct

Parameters:
  • operator – A Operator object

  • good_input_types – A list of allowed input types (e.g., [FloatTensorType, Int64TensorType]) or None. None means that we skip the check of the input types.

  • good_output_types – A list of allowed output types. See good_input_types for its format.

Concepts#

Containers#

class skl2onnx.common._container.SklearnModelContainerNode(sklearn_model, white_op=None, black_op=None, verbose=0)[source]#

Main container for one scikit-learn model. Every converter adds nodes to an existing container which is converted into a ONNX graph by an instance of Topology.

property input_names#

This function should return a list of strings. Each string corresponds to an input variable name. :return: a list of string

property output_names#

This function should return a list of strings. Each string corresponds to an output variable name. :return: a list of string

class skl2onnx.common._container.ModelComponentContainer(target_opset, options=None, registered_models=None, white_op=None, black_op=None, verbose=0)[source]#

In the conversion phase, this class is used to collect all materials required to build an ONNX GraphProto, which is encapsulated in a ONNX ModelProto.

add_initializer(name, onnx_type, shape, content)[source]#

Adds a TensorProto into the initializer list of the final ONNX model.

Parameters:
  • name – Variable name in the produced ONNX model.

  • onnx_type – Element types allowed in ONNX tensor, e.g., TensorProto.FLOAT and TensorProto.STRING.

  • shape – Tensor shape, a list of integers.

  • content – Flattened tensor values (i.e., a float list or a float array).

Returns:

created tensor

add_input(variable)[source]#

Adds our Variable object defined _parser.py into the the input list of the final ONNX model.

Parameters:

variable – The Variable object to be added

add_node(op_type, inputs, outputs, op_domain='', op_version=None, name=None, **attrs)[source]#

Adds a NodeProto into the node list of the final ONNX model. If the input operator’s domain-version information cannot be found in our domain-version pool (a Python set), we may add it.

Parameters:
  • op_type – A string (e.g., Pool and Conv) indicating the type of the NodeProto

  • inputs – A list of strings. They are the input variables’ names of the considered NodeProto

  • outputs – A list of strings. They are the output variables’ names of the considered NodeProto

  • op_domain – The domain name (e.g., ai.onnx.ml) of the operator we are trying to add.

  • op_version – The version number (e.g., 0 and 1) of the operator we are trying to add.

  • name – name of the node, this name cannot be empty

  • attrs – A Python dictionary. Keys and values are attributes’ names and attributes’ values, respectively.

add_output(variable)[source]#

Adds our Variable object defined _parser.py into the the output list of the final ONNX model.

Parameters:

variable – The Variable object to be added

Nodes#

class skl2onnx.common._topology.Operator(onnx_name, scope, type, raw_operator, target_opset, scope_inst)[source]#

Defines an operator available in ONNX.

class skl2onnx.common._topology.Variable(raw_name, onnx_name, scope, type=None)[source]#

Defines a variable which holds any data defined from ONNX types.

Scope#

class skl2onnx.common._topology.Scope(name, target_opset=None, custom_shape_calculators=None, options=None, registered_models=None, naming=None)[source]#

Every node of an ONNX graph must be unique. This class holds the list of existing name for every node already defined in graph. It also provides functions to create a unique unused name.

get_unique_operator_name(seed)[source]#

Creates a unique operator ID based on the given seed.

get_unique_variable_name(seed, rename=True)[source]#

Creates a unique variable ID based on the given seed.

Topology#

class skl2onnx.common._topology.Topology(model, default_batch_size=1, initial_types=None, target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, registered_models=None)[source]#

Holds instances on Scope and SklearnModelContainer. These are filled by the converters while a pipeline is being converted.

skl2onnx.common._topology.convert_topology(topology, model_name, doc_string, target_opset, options=None, remove_identity=True, verbose=0)[source]#

This function is used to convert our Topology object defined in _parser.py into a ONNX model (type: ModelProto).

Parameters:
  • topology – The Topology object we are going to convert

  • model_name – GraphProto’s name. Let “model” denote the returned model. The string “model_name” would be assigned to “model.graph.name.”

  • doc_string – A string attached to the produced model

  • target_opset – number or dictionary, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3, a dictionary is used to indicate different opset for different domains

  • options – see Converters with options

  • remove_identity – removes identity nodes include ‘1.1.2’, ‘1.2’, and so on.

  • verbose – displays information while converting

Returns:

a ONNX ModelProto