API Summary#
Summary of public functions and classes exposed in scikit-onnx.
Version#
Converters#
Both functions convert a scikit-learn model into ONNX. The first one lets the user manually define the input’s name and types. The second one infers this information from the training data. These two functions are the main entry points to converter. The rest of the API is needed if a model has no converter implemented in this package. A new converter has then to be registered, whether it is imported from another package or created from scratch.
- skl2onnx.convert_sklearn(model, name=None, initial_types=None, doc_string='', target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, options=None, intermediate=False, white_op=None, black_op=None, final_types=None, dtype=None, naming=None, model_optim=True, verbose=0)[source]#
This function produces an equivalent ONNX model of the given scikit-learn model. The supported converters is returned by function
supported_converters
.For pipeline conversion, user needs to make sure each component is one of our supported items. This function converts the specified scikit-learn model into its ONNX counterpart. Note that for all conversions, initial types are required. ONNX model name can also be specified.
- Parameters:
model – A scikit-learn model
initial_types – a python list. Each element is a tuple of a variable name and a type defined in data_types.py
name – The name of the graph (type: GraphProto) in the produced ONNX model (type: ModelProto)
doc_string – A string attached onto the produced ONNX model
target_opset – number, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3, if value is not specified, the function will choose the latest tested opset (see
skl2onnx.get_latest_tested_opset_version()
)custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters
custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.
custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary
{ type: fct_parser(scope, model, inputs, custom_parsers=None) }
options – specific options given to converters (see Converters with options)
intermediate – if True, the function returns the converted model and the instance of
Topology
used, it returns the converted model otherwisewhite_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
dtype – removed in version 1.7.5, dtype is now inferred from input types, converters may add operators Cast to switch to double when it is necessary
naming – the user may want to change the way intermediate are named, this parameter can be a string (a prefix) or a function, which signature is the following: get_name(name, existing_names), the library will then check this name is unique and modify it if not
model_optim – enable or disable model optimisation after the model was converted into onnx, it reduces the number of identity nodes
verbose – display progress while converting a model
- Returns:
An ONNX model (type: ModelProto) which is equivalent to the input scikit-learn model
Example of initial_types: Assume that the specified scikit-learn model takes a heterogeneous list as its input. If the first 5 elements are floats and the last 10 elements are integers, we need to specify initial types as below. The [None] in [None, 5] indicates the batch size here is unknown.
from skl2onnx.common.data_types import FloatTensorType, Int64TensorType initial_type = [('float_input', FloatTensorType([None, 5])), ('int64_input', Int64TensorType([None, 10]))]
Note
If a pipeline includes an instance of ColumnTransformer, scikit-learn allow the user to specify columns by names. This option is not supported by sklearn-onnx as features names could be different in input data and the ONNX graph (defined by parameter initial_types), only integers are supported.
Converters options#
Some ONNX operators exposes parameters sklearn-onnx cannot guess from the raw model. Some default values are usually suggested but the users may have to manually overwrite them. This need is not obvious to do when a model is included in a pipeline. That’s why these options can be given to function convert_sklearn as a dictionary
{model_type: parameters in a dictionary}
or{model_id: parameters in a dictionary}
. Option sep is used to specify the delimiters between two words when the ONNX graph needs to tokenize a string. The default value is short and may not include all the necessary values. It can be overwritten as:extra = {TfidfVectorizer: {"separators": [' ', '[.]', '\\?', ',', ';', ':', '\\!', '\\(', '\\)']}} model_onnx = convert_sklearn( model, "tfidf", initial_types=[("input", StringTensorType([None, 1]))], options=extra)
But if a pipeline contains two model of the same class, it is possible to distinguish between the two with function id:
extra = {id(model): {"separators": [' ', '.', '\\?', ',', ';', ':', '\\!', '\\(', '\\)']}} model_onnx = convert_sklearn( pipeline, "pipeline-with-2-tfidf", initial_types=[("input", StringTensorType([None, 1]))], options=extra)
It is used in example TfIdfVectorizer with ONNX.
Changed in version 1.10.0: Parameter naming was added.
- skl2onnx.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, white_op=None, black_op=None, final_types=None, dtype=None, naming=None, model_optim=True, verbose=0)[source]#
Calls
convert_sklearn()
with simplified parameters.- Parameters:
model – model to convert
X – training set, can be None, it is used to infered the input types (initial_types)
initial_types – if X is None, then initial_types must be defined
target_opset – conversion with a specific target opset
options – specific options given to converters (see Converters with options)
name – name of the model
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
dtype – removed in version 1.7.5, dtype is now inferred from input types, converters may add operators Cast to switch to double when it is necessary
naming – the user may want to change the way intermediate are named, this parameter can be a string (a prefix) or a function, which signature is the following: get_name(name, existing_names), the library will then check this name is unique and modify it if not
model_optim – enable or disable model optimisation after the model was converted into onnx, it reduces the number of identity nodes
verbose – display progress while converting a model
- Returns:
converted model
This function checks if the model inherits from class
OnnxOperatorMixin
, it calls method to_onnx in that case otherwise it callsconvert_sklearn()
.Changed in version 1.10.0: Parameter naming was added.
Logging#
The conversion of a pipeline fails if it contains an object without any associated converter. It may also fails if one of the object is mapped by a custom converter. If the error message is not explicit enough, it is possible to enable logging:
import logging
logger = logging.getLogger('skl2onnx')
logger.setLevel(logging.DEBUG)
Example Logging, verbose illustrates what it looks like.
Register a new converter#
If a model has no converter implemented in this package, a new converter has then to be registered, whether it is imported from another package or created from scratch. Section Covered Converters lists all available converters.
- skl2onnx.supported_converters(from_sklearn=False)[source]#
Returns the list of supported converters. To find the converter associated to a specific model, the library gets the name of the model class, adds
'Sklearn'
as a prefix and retrieves the associated converter if available.- Parameters:
from_sklearn – every supported model is mapped to converter by a name prefixed with
'Sklearn'
, the prefix is removed if this parameter is False but the function only returns converters whose name is prefixed by'Sklearn'
- Returns:
list of supported models as string
- skl2onnx.update_registered_converter(model, alias, shape_fct, convert_fct, overwrite=True, parser=None, options=None)[source]#
Registers or updates a converter for a new model so that it can be converted when inserted in a scikit-learn pipeline.
- Parameters:
model – model class
alias – alias used to register the model
shape_fct – function which checks or modifies the expected outputs, this function should be fast so that the whole graph can be computed followed by the conversion of each model, parallelized or not
convert_fct – function which converts a model
overwrite – False to raise exception if a converter already exists
parser – overwrites the parser as well if not empty
options – registered options for this converter
The alias is usually the library name followed by the model name. Example:
from skl2onnx.common.shape_calculator import calculate_linear_classifier_output_shapes from skl2onnx.operator_converters.RandomForest import convert_sklearn_random_forest_classifier from skl2onnx import update_registered_converter update_registered_converter( SGDClassifier, 'SklearnLinearClassifier', calculate_linear_classifier_output_shapes, convert_sklearn_random_forest_classifier, options={'zipmap': [True, False, 'columns'], 'output_class_labels': [False, True], 'raw_scores': [True, False]})
The function does not update the parser if not specified except if option ‘zipmap’ is added to the list. Every classifier must declare this option to let the default parser automatically handle that option.
- skl2onnx.update_registered_parser(model, parser_fct)[source]#
Registers or updates a parser for a new model. A parser returns the expected output of a model.
- Parameters:
model – model class
parser_fct – parser, signature is the same as
parse_sklearn
Manipulate ONNX graphs#
- skl2onnx.helpers.onnx_helper.enumerate_model_node_outputs(model, add_node=False)[source]#
Enumerates all the nodes of a model.
- Parameters:
model – ONNX graph
add_node – if False, the function enumerates all output names from every node, otherwise, it enumerates tuple (output name, node)
- Returns:
enumerator
- skl2onnx.helpers.onnx_helper.load_onnx_model(onnx_file_or_bytes)[source]#
Loads an ONNX file.
- Parameters:
onnx_file_or_bytes – ONNX file or bytes
- Returns:
ONNX model
Parsers#
- skl2onnx._parse.parse_sklearn(scope, model, inputs, custom_parsers=None, final_types=None)[source]#
This is a delegate function. It does nothing but invokes the correct parsing function according to the input model’s type.
- Parameters:
scope – Scope object
model – A scikit-learn object (e.g., OneHotEncoder and LogisticRegression)
inputs – A list of variables
custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary
{ type: fct_parser(scope, model, inputs, custom_parsers=None) }
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
- Returns:
The output variables produced by the input model
- skl2onnx._parse.parse_sklearn_model(model, initial_types=None, target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, options=None, white_op=None, black_op=None, final_types=None, naming=None)[source]#
Puts scikit-learn object into an abstract container so that our framework can work seamlessly on models created with different machine learning tools.
- Parameters:
model – A scikit-learn model
initial_types – a python list. Each element is a tuple of a variable name and a type defined in data_types.py
target_opset – number, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3.
custom_conversion_functions – a dictionary for specifying the user customized conversion function if not registered
custom_shape_calculators – a dictionary for specifying the user customized shape calculator if not registered
custom_parsers – parsers determines which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary
{ type: fct_parser(scope, model, inputs, custom_parsers=None) }
options – specific options given to converters (see Converters with options)
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
naming – the user may want to change the way intermediate are named, this parameter can be a string (a prefix) or a function, which signature is the following: get_name(name, existing_names), the library will then check this name is unique and modify it if not
- Returns:
Changed in version 1.10.0: Parameter naming was added.
Utils for contributors#
- skl2onnx.common.utils.check_input_and_output_numbers(operator, input_count_range=None, output_count_range=None)[source]#
Check if the number of input(s)/output(s) is correct
- Parameters:
operator – A Operator object
input_count_range – A list of two integers or an integer. If it’s a list the first/second element is the
minimal/maximal number of inputs. If it’s an integer, it is equivalent to specify that number twice in a list. For infinite ranges like 5 to infinity, you need to use [5, None]. :param output_count_range: A list of two integers or an integer. See input_count_range for its format.
- skl2onnx.common.utils.check_input_and_output_types(operator, good_input_types=None, good_output_types=None)[source]#
Check if the type(s) of input(s)/output(s) is(are) correct
- Parameters:
operator – A Operator object
good_input_types – A list of allowed input types (e.g., [FloatTensorType, Int64TensorType]) or None. None
means that we skip the check of the input types. :param good_output_types: A list of allowed output types. See good_input_types for its format.
Concepts#
Containers#
- class skl2onnx.common._container.SklearnModelContainerNode(sklearn_model, white_op=None, black_op=None, verbose=0)[source]#
Main container for one scikit-learn model. Every converter adds nodes to an existing container which is converted into a ONNX graph by an instance of
Topology
.- property input_names#
This function should return a list of strings. Each string corresponds to an input variable name. :return: a list of string
- property output_names#
This function should return a list of strings. Each string corresponds to an output variable name. :return: a list of string
- class skl2onnx.common._container.ModelComponentContainer(target_opset, options=None, registered_models=None, white_op=None, black_op=None, verbose=0)[source]#
In the conversion phase, this class is used to collect all materials required to build an ONNX GraphProto, which is encapsulated in a ONNX ModelProto.
- add_initializer(name, onnx_type, shape, content)[source]#
Adds a TensorProto into the initializer list of the final ONNX model.
- Parameters:
name – Variable name in the produced ONNX model.
onnx_type – Element types allowed in ONNX tensor, e.g., TensorProto.FLOAT and TensorProto.STRING.
shape – Tensor shape, a list of integers.
content – Flattened tensor values (i.e., a float list or a float array).
- Returns:
created tensor
- add_input(variable)[source]#
Adds our Variable object defined _parser.py into the the input list of the final ONNX model.
- Parameters:
variable – The Variable object to be added
- add_node(op_type, inputs, outputs, op_domain='', op_version=None, name=None, **attrs)[source]#
Adds a NodeProto into the node list of the final ONNX model. If the input operator’s domain-version information cannot be found in our domain-version pool (a Python set), we may add it.
- Parameters:
op_type – A string (e.g., Pool and Conv) indicating the type of the NodeProto
inputs – A list of strings. They are the input variables’ names of the considered NodeProto
outputs – A list of strings. They are the output variables’ names of the considered NodeProto
op_domain – The domain name (e.g., ai.onnx.ml) of the operator we are trying to add.
op_version – The version number (e.g., 0 and 1) of the operator we are trying to add.
name – name of the node, this name cannot be empty
attrs – A Python dictionary. Keys and values are attributes’ names and attributes’ values, respectively.
Nodes#
Scope#
- class skl2onnx.common._topology.Scope(name, target_opset=None, custom_shape_calculators=None, options=None, registered_models=None, naming=None)[source]#
Every node of an ONNX graph must be unique. This class holds the list of existing name for every node already defined in graph. It also provides functions to create a unique unused name.
Topology#
- class skl2onnx.common._topology.Topology(model, default_batch_size=1, initial_types=None, target_opset=None, custom_conversion_functions=None, custom_shape_calculators=None, registered_models=None)[source]#
Holds instances on
Scope
andSklearnModelContainer
. These are filled by the converters while a pipeline is being converted.
- skl2onnx.common._topology.convert_topology(topology, model_name, doc_string, target_opset, options=None, remove_identity=True, verbose=0)[source]#
This function is used to convert our Topology object defined in _parser.py into a ONNX model (type: ModelProto).
- Parameters:
topology – The Topology object we are going to convert
model_name – GraphProto’s name. Let “model” denote the returned model. The string “model_name” would be assigned to “model.graph.name.”
doc_string – A string attached to the produced model
target_opset – number or dictionary, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3, a dictionary is used to indicate different opset for different domains
options – see Converters with options
remove_identity – removes identity nodes include ‘1.1.2’, ‘1.2’, and so on.
verbose – displays information while converting
- Returns:
a ONNX ModelProto