API Summary

Summary of public functions and classes exposed in scikit-onnx.

Version

skl2onnx.get_latest_tested_opset_version()[source]

This module relies on onnxruntime to test every converter. The function returns the most recent target opset tested with onnxruntime or the opset version specified by onnx package if this one is lower (return by onnx.defs.onnx_opset_version()).

Converters

Both functions convert a scikit-learn model into ONNX. The first one lets the user manually define the input’s name and types. The second one infers this information from the training data. These two functions are the main entry points to converter. The rest of the API is needed if a model has no converter implemented in this package. A new converter has then to be registered, whether it is imported from another package or created from scratch.

skl2onnx.convert_sklearn(model, name=None, initial_types=None, doc_string='', target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, options=None, intermediate=False, white_op=None, black_op=None, final_types=None, dtype=None)[source]

This function produces an equivalent ONNX model of the given scikit-learn model. The supported converters is returned by function supported_converters.

For pipeline conversion, user needs to make sure each component is one of our supported items. This function converts the specified scikit-learn model into its ONNX counterpart. Note that for all conversions, initial types are required. ONNX model name can also be specified.

Parameters
  • model – A scikit-learn model

  • initial_types – a python list. Each element is a tuple of a variable name and a type defined in data_types.py

  • name – The name of the graph (type: GraphProto) in the produced ONNX model (type: ModelProto)

  • doc_string – A string attached onto the produced ONNX model

  • target_opset – number, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3, if value is not specified, the function will choose the latest tested opset (see skl2onnx.get_latest_tested_opset_version())

  • custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters

  • custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.

  • custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • options – specific options given to converters (see Converters with options)

  • intermediate – if True, the function returns the converted model and , and Topology, it returns the converted model otherwise

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • dtype – removed in version 1.7.5, dtype is now inferred from input types, converters may add operators Cast to switch to double when it is necessary

Returns

An ONNX model (type: ModelProto) which is equivalent to the input scikit-learn model

Example of initial_types: Assume that the specified scikit-learn model takes a heterogeneous list as its input. If the first 5 elements are floats and the last 10 elements are integers, we need to specify initial types as below. The [None] in [None, 5] indicates the batch size here is unknown.

from skl2onnx.common.data_types import FloatTensorType, Int64TensorType
initial_type = [('float_input', FloatTensorType([None, 5])),
                ('int64_input', Int64TensorType([None, 10]))]

Note

If a pipeline includes an instance of ColumnTransformer, scikit-learn allow the user to specify columns by names. This option is not supported by sklearn-onnx as features names could be different in input data and the ONNX graph (defined by parameter initial_types), only integers are supported.

Some ONNX operators exposes parameters sklearn-onnx cannot guess from the raw model. Some default values are usually suggested but the users may have to manually overwrite them. This need is not obvious to do when a model is included in a pipeline. That’s why these options can be given to function convert_sklearn as a dictionary {model_type: parameters in a dictionary} or {model_id: parameters in a dictionary}. Option sep is used to specify the delimiters between two words when the ONNX graph needs to tokenize a string. The default value is short and may not include all the necessary values. It can be overwritten as:

extra = {TfidfVectorizer: {"separators": [' ', '[.]', '\\?',
            ',', ';', ':', '\\!', '\\(', '\\)']}}
model_onnx = convert_sklearn(model, "tfidf",
                             initial_types=[("input", StringTensorType([None, 1]))],
                             options=extra)

But if a pipeline contains two model of the same class, it is possible to distinguish between the two with function id:

extra = {id(model): {"separators": [' ', '.', '\\?', ',', ';',
            ':', '\\!', '\\(', '\\)']}}
model_onnx = convert_sklearn(pipeline, "pipeline-with-2-tfidf",
                             initial_types=[("input", StringTensorType([None, 1]))],
                             options=extra)

It is used in example TfIdfVectorizer with ONNX.

Changed in version 1.7: Parameter target_opset, if not specified, is now set to the latest tested opset returned by skl2onnx.get_latest_tested_opset_version() and not the version of the onnx package.

skl2onnx.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, white_op=None, black_op=None, final_types=None, dtype=None)[source]

Calls convert_sklearn() with simplified parameters.

Parameters
  • model – model to convert

  • X – training set, can be None, it is used to infered the input types (initial_types)

  • initial_types – if X is None, then initial_types must be defined

  • target_opset – conversion with a specific target opset

  • options – specific options given to converters (see Converters with options)

  • name – name of the model

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • dtype – removed in version 1.7.5, dtype is now inferred from input types, converters may add operators Cast to switch to double when it is necessary

Returns

converted model

This function checks if the model inherits from class OnnxOperatorMixin, it calls method to_onnx in that case otherwise it calls convert_sklearn().

Register a new converter

If a model has no converter implemented in this package, a new converter has then to be registered, whether it is imported from another package or created from scratch. Section Covered Converters lists all available converters.

skl2onnx.supported_converters(from_sklearn=False)[source]

Returns the list of supported converters. To find the converter associated to a specific model, the library gets the name of the model class, adds 'Sklearn' as a prefix and retrieves the associated converter if available.

Parameters

from_sklearn – every supported model is mapped to converter by a name prefixed with 'Sklearn', the prefix is removed if this parameter is False but the function only returns converters whose name is prefixed by 'Sklearn'

Returns

list of supported models as string

skl2onnx.update_registered_converter(model, alias, shape_fct, convert_fct, overwrite=True, parser=None, options=None)[source]

Registers or updates a converter for a new model so that it can be converted when inserted in a scikit-learn pipeline.

Parameters
  • model – model class

  • alias – alias used to register the model

  • shape_fct – function which checks or modifies the expected outputs, this function should be fast so that the whole graph can be computed followed by the conversion of each model, parallelized or not

  • convert_fct – function which converts a model

  • overwrite – False to raise exception if a converter already exists

  • parser – overwrites the parser as well if not empty

  • options – registered options for this converter

The alias is usually the library name followed by the model name. Example:

from skl2onnx.common.shape_calculator import calculate_linear_classifier_output_shapes
from skl2onnx.operator_converters.RandomForest import convert_sklearn_random_forest_classifier
from skl2onnx import update_registered_converter
update_registered_converter(SGDClassifier, 'SklearnLinearClassifier',
                            calculate_linear_classifier_output_shapes,
                            convert_sklearn_random_forest_classifier,
                            options={'zipmap': [True, False],
                                     'raw_scores': [True, False]})
skl2onnx.update_registered_parser(model, parser_fct)[source]

Registers or updates a parser for a new model. A parser returns the expected output of a model.

Parameters
  • model – model class

  • parser_fct – parser, signature is the same as parse_sklearn

Manipulate ONNX graphs

skl2onnx.helpers.onnx_helper.enumerate_model_node_outputs(model, add_node=False)[source]

Enumerates all the nodes of a model.

Parameters
  • model – ONNX graph

  • add_node – if False, the function enumerates all output names from every node, otherwise, it enumerates tuple (output name, node)

Returns

enumerator

skl2onnx.helpers.onnx_helper.load_onnx_model(onnx_file_or_bytes)[source]

Loads an ONNX file.

Parameters

onnx_file_or_bytesONNX file or bytes

Returns

ONNX model

skl2onnx.helpers.onnx_helper.select_model_inputs_outputs(model, outputs=None, inputs=None)[source]

Takes a model and changes its outputs.

Parameters
  • modelONNX model

  • inputs – new inputs

  • outputs – new outputs

Returns

modified model

The function removes unneeded files.

skl2onnx.helpers.onnx_helper.save_onnx_model(model, filename=None)[source]

Saves a model as a file or bytes.

Parameters
  • modelONNX model

  • filename – filename or None to return bytes

Returns

bytes

Parsers

skl2onnx._parse.parse_sklearn(scope, model, inputs, custom_parsers=None, final_types=None)[source]

This is a delegate function. It does nothing but invokes the correct parsing function according to the input model’s type.

Parameters
  • scope – Scope object

  • model – A scikit-learn object (e.g., OneHotEncoder and LogisticRegression)

  • inputs – A list of variables

  • custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

Returns

The output variables produced by the input model

skl2onnx._parse.parse_sklearn_model(model, initial_types=None, target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, options=None, white_op=None, black_op=None, final_types=None)[source]

Puts scikit-learn object into an abstract container so that our framework can work seamlessly on models created with different machine learning tools.

Parameters
  • model – A scikit-learn model

  • initial_types – a python list. Each element is a tuple of a variable name and a type defined in data_types.py

  • target_opset – number, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3.

  • custom_conversion_functions – a dictionary for specifying the user customized conversion function if not registered

  • custom_shape_calculators – a dictionary for specifying the user customized shape calculator if not registered

  • custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • options – specific options given to converters (see Converters with options)

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

Returns

Topology

Utils for contributors

skl2onnx.common.utils.check_input_and_output_numbers(operator, input_count_range=None, output_count_range=None)[source]

Check if the number of input(s)/output(s) is correct

Parameters
  • operator – A Operator object

  • input_count_range – A list of two integers or an integer. If it’s a list the first/second element is the

minimal/maximal number of inputs. If it’s an integer, it is equivalent to specify that number twice in a list. For infinite ranges like 5 to infinity, you need to use [5, None]. :param output_count_range: A list of two integers or an integer. See input_count_range for its format.

skl2onnx.common.utils.check_input_and_output_types(operator, good_input_types=None, good_output_types=None)[source]

Check if the type(s) of input(s)/output(s) is(are) correct

Parameters
  • operator – A Operator object

  • good_input_types – A list of allowed input types (e.g., [FloatTensorType, Int64TensorType]) or None. None

means that we skip the check of the input types. :param good_output_types: A list of allowed output types. See good_input_types for its format.

Concepts

Containers

class skl2onnx.common._container.SklearnModelContainerNode(sklearn_model, white_op=None, black_op=None)[source]

Main container for one scikit-learn model. Every converter adds nodes to an existing container which is converted into a ONNX graph by an instance of Topology.

property input_names

This function should return a list of strings. Each string corresponds to an input variable name. :return: a list of string

property output_names

This function should return a list of strings. Each string corresponds to an output variable name. :return: a list of string

class skl2onnx.common._container.ModelComponentContainer(target_opset, options=None, registered_models=None, white_op=None, black_op=None)[source]

In the conversion phase, this class is used to collect all materials required to build an ONNX GraphProto, which is encapsulated in a ONNX ModelProto.

add_initializer(name, onnx_type, shape, content)[source]

Adds a TensorProto into the initializer list of the final ONNX model.

Parameters
  • name – Variable name in the produced ONNX model.

  • onnx_type – Element types allowed in ONNX tensor, e.g., TensorProto.FLOAT and TensorProto.STRING.

  • shape – Tensor shape, a list of integers.

  • content – Flattened tensor values (i.e., a float list or a float array).

Returns

created tensor

add_input(variable)[source]

Adds our Variable object defined _parser.py into the the input list of the final ONNX model.

Parameters

variable – The Variable object to be added

add_node(op_type, inputs, outputs, op_domain='', op_version=None, name=None, **attrs)[source]

Adds a NodeProto into the node list of the final ONNX model. If the input operator’s domain-version information cannot be found in our domain-version pool (a Python set), we may add it.

Parameters
  • op_type – A string (e.g., Pool and Conv) indicating the type of the NodeProto

  • inputs – A list of strings. They are the input variables’ names of the considered NodeProto

  • outputs – A list of strings. They are the output variables’ names of the considered NodeProto

  • op_domain – The domain name (e.g., ai.onnx.ml) of the operator we are trying to add.

  • op_version – The version number (e.g., 0 and 1) of the operator we are trying to add.

  • name – name of the node, this name cannot be empty

  • attrs – A Python dictionary. Keys and values are attributes’ names and attributes’ values, respectively.

add_output(variable)[source]

Adds our Variable object defined _parser.py into the the output list of the final ONNX model.

Parameters

variable – The Variable object to be added

Nodes

class skl2onnx.common._topology.Operator(onnx_name, scope, type, raw_operator, target_opset, scope_inst)[source]

Defines an operator available in ONNX.

class skl2onnx.common._topology.Variable(raw_name, onnx_name, scope, type=None)[source]

Defines a variable which holds any data defined from ONNX types.

Scope

class skl2onnx.common._topology.Scope(name, parent_scopes=None, variable_name_set=None, operator_name_set=None, target_opset=None, custom_shape_calculators=None, options=None, registered_models=None)[source]

Every node of an ONNX graph must be unique. This class holds the list of existing name for every node already defined in graph. It also provides functions to create a unique unused name.

get_unique_operator_name(seed)[source]

Creates a unique operator ID based on the given seed.

get_unique_variable_name(seed)[source]

Creates a unique variable ID based on the given seed.

Topology

class skl2onnx.common._topology.Topology(model, default_batch_size=1, initial_types=None, reserved_variable_names=None, reserved_operator_names=None, target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, registered_models=None)[source]

Holds instances on Scope and SklearnModelContainer. These are filled by the converters while a pipeline is being converted. When all converters were called, method Topology.compile must be called to convert the topological graph into ONNX graph.

compile()[source]

This function aims at giving every operator enough information so that all operator conversions can happen independently. We also want to check, fix, and simplify the network structure here.