.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_tutorial/plot_dbegin_options_zipmap.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_tutorial_plot_dbegin_options_zipmap.py: .. _l-tutorial-example-zipmap: Choose appropriate output of a classifier ========================================= A scikit-learn classifier usually returns a matrix of probabilities. By default, *sklearn-onnx* converts that matrix into a list of dictionaries where each probabily is mapped to its class id or name. That mechanism retains the class names but is slower. Let's see what other options are available. Train a model and convert it ++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 20-44 .. code-block:: Python from timeit import repeat import numpy import sklearn from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import onnxruntime as rt import onnx import skl2onnx from skl2onnx import to_onnx from sklearn.linear_model import LogisticRegression from sklearn.multioutput import MultiOutputClassifier iris = load_iris() X, y = iris.data, iris.target X = X.astype(numpy.float32) y = y * 2 + 10 # to get labels different from [0, 1, 2] X_train, X_test, y_train, y_test = train_test_split(X, y) clr = LogisticRegression(max_iter=500) clr.fit(X_train, y_train) print(clr) onx = to_onnx(clr, X_train, target_opset=12) .. rst-class:: sphx-glr-script-out .. code-block:: none LogisticRegression(max_iter=500) .. GENERATED FROM PYTHON SOURCE LINES 45-50 Default behaviour: zipmap=True ++++++++++++++++++++++++++++++ The output type for the probabilities is a list of dictionaries. .. GENERATED FROM PYTHON SOURCE LINES 50-57 .. code-block:: Python sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"]) res = sess.run(None, {"X": X_test}) print(res[1][:2]) print("probabilities type:", type(res[1])) print("type for the first observations:", type(res[1][0])) .. rst-class:: sphx-glr-script-out .. code-block:: none [{10: 7.888656000432093e-06, 12: 0.028723130002617836, 14: 0.9712689518928528}, {10: 1.6195890850667638e-07, 12: 0.008782343938946724, 14: 0.9912174940109253}] probabilities type: type for the first observations: .. GENERATED FROM PYTHON SOURCE LINES 58-62 Option zipmap=False +++++++++++++++++++ Probabilities are now a matrix. .. GENERATED FROM PYTHON SOURCE LINES 62-74 .. code-block:: Python options = {id(clr): {"zipmap": False}} onx2 = to_onnx(clr, X_train, options=options, target_opset=12) sess2 = rt.InferenceSession( onx2.SerializeToString(), providers=["CPUExecutionProvider"] ) res2 = sess2.run(None, {"X": X_test}) print(res2[1][:2]) print("probabilities type:", type(res2[1])) print("type for the first observations:", type(res2[1][0])) .. rst-class:: sphx-glr-script-out .. code-block:: none [[7.8886560e-06 2.8723130e-02 9.7126895e-01] [1.6195891e-07 8.7823439e-03 9.9121749e-01]] probabilities type: type for the first observations: .. GENERATED FROM PYTHON SOURCE LINES 75-81 Option zipmap='columns' +++++++++++++++++++++++ This options removes the final operator ZipMap and splits the probabilities into columns. The final model produces one output for the label, and one output per class. .. GENERATED FROM PYTHON SOURCE LINES 81-97 .. code-block:: Python options = {id(clr): {"zipmap": "columns"}} onx3 = to_onnx(clr, X_train, options=options, target_opset=12) sess3 = rt.InferenceSession( onx3.SerializeToString(), providers=["CPUExecutionProvider"] ) res3 = sess3.run(None, {"X": X_test}) for i, out in enumerate(sess3.get_outputs()): print( "output: '{}' shape={} values={}...".format( out.name, res3[i].shape, res3[i][:2] ) ) .. rst-class:: sphx-glr-script-out .. code-block:: none output: 'output_label' shape=(38,) values=[14 14]... output: 'i10' shape=(38,) values=[7.8886560e-06 1.6195891e-07]... output: 'i12' shape=(38,) values=[0.02872313 0.00878234]... output: 'i14' shape=(38,) values=[0.97126895 0.9912175 ]... .. GENERATED FROM PYTHON SOURCE LINES 98-100 Let's compare prediction time +++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 100-117 .. code-block:: Python print("Average time with ZipMap:") print(sum(repeat(lambda: sess.run(None, {"X": X_test}), number=100, repeat=10)) / 10) print("Average time without ZipMap:") print(sum(repeat(lambda: sess2.run(None, {"X": X_test}), number=100, repeat=10)) / 10) print("Average time without ZipMap but with columns:") print(sum(repeat(lambda: sess3.run(None, {"X": X_test}), number=100, repeat=10)) / 10) # The prediction is much faster without ZipMap # on this example. # The optimisation is even faster when the classes # are described with strings and not integers # as the final result (list of dictionaries) may copy # many times the same information with onnxruntime. .. rst-class:: sphx-glr-script-out .. code-block:: none Average time with ZipMap: 0.006013637700380059 Average time without ZipMap: 0.0018747422996966633 Average time without ZipMap but with columns: 0.0029320066998479886 .. GENERATED FROM PYTHON SOURCE LINES 118-125 Option zimpap=False and output_class_labels=True ++++++++++++++++++++++++++++++++++++++++++++++++ Option `zipmap=False` seems a better choice because it is much faster but labels are lost in the process. Option `output_class_labels` can be used to expose the labels as a third output. .. GENERATED FROM PYTHON SOURCE LINES 125-137 .. code-block:: Python options = {id(clr): {"zipmap": False, "output_class_labels": True}} onx4 = to_onnx(clr, X_train, options=options, target_opset=12) sess4 = rt.InferenceSession( onx4.SerializeToString(), providers=["CPUExecutionProvider"] ) res4 = sess4.run(None, {"X": X_test}) print(res4[1][:2]) print("probabilities type:", type(res4[1])) print("class labels:", res4[2]) .. rst-class:: sphx-glr-script-out .. code-block:: none [[7.8886560e-06 2.8723130e-02 9.7126895e-01] [1.6195891e-07 8.7823439e-03 9.9121749e-01]] probabilities type: class labels: [10 12 14] .. GENERATED FROM PYTHON SOURCE LINES 138-139 Processing time. .. GENERATED FROM PYTHON SOURCE LINES 139-143 .. code-block:: Python print("Average time without ZipMap but with output_class_labels:") print(sum(repeat(lambda: sess4.run(None, {"X": X_test}), number=100, repeat=10)) / 10) .. rst-class:: sphx-glr-script-out .. code-block:: none Average time without ZipMap but with output_class_labels: 0.0019766365025134292 .. GENERATED FROM PYTHON SOURCE LINES 144-151 MultiOutputClassifier +++++++++++++++++++++ This model is equivalent to several classifiers, one for every label to predict. Instead of returning a matrix of probabilities, it returns a sequence of matrices. Let's first modify the labels to get a problem for a MultiOutputClassifier. .. GENERATED FROM PYTHON SOURCE LINES 151-156 .. code-block:: Python y = numpy.vstack([y, y + 100]).T y[::5, 1] = 1000 # Let's a fourth class. print(y[:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none [[ 10 1000] [ 10 110] [ 10 110] [ 10 110] [ 10 110]] .. GENERATED FROM PYTHON SOURCE LINES 157-158 Let's train a MultiOutputClassifier. .. GENERATED FROM PYTHON SOURCE LINES 158-172 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split(X, y) clr = MultiOutputClassifier(LogisticRegression(max_iter=500)) clr.fit(X_train, y_train) print(clr) onx5 = to_onnx(clr, X_train, target_opset=12) sess5 = rt.InferenceSession( onx5.SerializeToString(), providers=["CPUExecutionProvider"] ) res5 = sess5.run(None, {"X": X_test[:3]}) print(res5) .. rst-class:: sphx-glr-script-out .. code-block:: none MultiOutputClassifier(estimator=LogisticRegression(max_iter=500)) /home/xadupre/github/sklearn-onnx/skl2onnx/_parse.py:582: UserWarning: Option zipmap is ignored for model . Set option zipmap to False to remove this message. warnings.warn( [array([[ 10, 110], [ 14, 114], [ 14, 114]], dtype=int64), [array([[9.6331292e-01, 3.6686804e-02, 3.1316745e-07], [1.3207483e-04, 1.0198733e-01, 8.9788061e-01], [4.0349053e-04, 1.9789213e-01, 8.0170435e-01]], dtype=float32), array([[6.0537982e-01, 6.7578614e-02, 2.8624981e-05, 3.2701299e-01], [3.8744797e-04, 1.6307302e-01, 4.3756142e-01, 3.9897814e-01], [8.7575515e-04, 2.1151876e-01, 4.4912148e-01, 3.3848402e-01]], dtype=float32)]] .. GENERATED FROM PYTHON SOURCE LINES 173-175 Option zipmap is ignored. Labels are missing but they can be added back as a third output. .. GENERATED FROM PYTHON SOURCE LINES 175-192 .. code-block:: Python onx6 = to_onnx( clr, X_train, target_opset=12, options={"zipmap": False, "output_class_labels": True}, ) sess6 = rt.InferenceSession( onx6.SerializeToString(), providers=["CPUExecutionProvider"] ) res6 = sess6.run(None, {"X": X_test[:3]}) print("predicted labels", res6[0]) print("predicted probabilies", res6[1]) print("class labels", res6[2]) .. rst-class:: sphx-glr-script-out .. code-block:: none predicted labels [[ 10 110] [ 14 114] [ 14 114]] predicted probabilies [array([[9.6331292e-01, 3.6686804e-02, 3.1316745e-07], [1.3207483e-04, 1.0198733e-01, 8.9788061e-01], [4.0349053e-04, 1.9789213e-01, 8.0170435e-01]], dtype=float32), array([[6.0537982e-01, 6.7578614e-02, 2.8624981e-05, 3.2701299e-01], [3.8744797e-04, 1.6307302e-01, 4.3756142e-01, 3.9897814e-01], [8.7575515e-04, 2.1151876e-01, 4.4912148e-01, 3.3848402e-01]], dtype=float32)] class labels [array([10, 12, 14], dtype=int64), array([ 110, 112, 114, 1000], dtype=int64)] .. GENERATED FROM PYTHON SOURCE LINES 193-194 **Versions used for this example** .. GENERATED FROM PYTHON SOURCE LINES 194-200 .. code-block:: Python print("numpy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) print("onnx: ", onnx.__version__) print("onnxruntime: ", rt.__version__) print("skl2onnx: ", skl2onnx.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none numpy: 2.4.1 scikit-learn: 1.8.0 onnx: 1.21.0 onnxruntime: 1.24.0 skl2onnx: 1.20.0 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.499 seconds) .. _sphx_glr_download_auto_tutorial_plot_dbegin_options_zipmap.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dbegin_options_zipmap.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dbegin_options_zipmap.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_dbegin_options_zipmap.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_