.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_benchmark_pipeline.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_benchmark_pipeline.py: Benchmark a pipeline ==================== The following example checks up on every step in a pipeline, compares and benchmarks the predictions. Create a pipeline +++++++++++++++++ We reuse the pipeline implemented in example `Pipelining: chaining a PCA and a logistic regression `_. There is one change because `ONNX-ML Imputer `_ does not handle string type. This cannot be part of the final ONNX pipeline and must be removed. Look for comment starting with ``---`` below. .. GENERATED FROM PYTHON SOURCE LINES 23-51 .. code-block:: Python import skl2onnx import onnx import sklearn import numpy from skl2onnx.helpers import collect_intermediate_steps from timeit import timeit from skl2onnx.helpers import compare_objects import onnxruntime as rt from onnxconverter_common.data_types import FloatTensorType from skl2onnx import convert_sklearn import numpy as np import pandas as pd from sklearn import datasets from sklearn.decomposition import PCA from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline logistic = LogisticRegression() pca = PCA() pipe = Pipeline(steps=[("pca", pca), ("logistic", logistic)]) digits = datasets.load_digits() X_digits = digits.data[:1000] y_digits = digits.target[:1000] pipe.fit(X_digits, y_digits) .. raw:: html
Pipeline(steps=[('pca', PCA()), ('logistic', LogisticRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 52-54 Conversion to ONNX ++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 54-69 .. code-block:: Python initial_types = [("input", FloatTensorType((None, X_digits.shape[1])))] model_onnx = convert_sklearn(pipe, initial_types=initial_types, target_opset=12) sess = rt.InferenceSession( model_onnx.SerializeToString(), providers=["CPUExecutionProvider"] ) print("skl predict_proba") print(pipe.predict_proba(X_digits[:2])) onx_pred = sess.run(None, {"input": X_digits[:2].astype(np.float32)})[1] df = pd.DataFrame(onx_pred) print("onnx predict_proba") print(df.values) .. rst-class:: sphx-glr-script-out .. code-block:: none skl predict_proba [[9.99998530e-01 7.81608916e-19 4.87445989e-10 1.79842282e-08 3.58700554e-10 1.18138025e-06 4.14411051e-08 1.48275027e-07 2.50162860e-08 5.51240034e-08] [1.37889361e-14 9.99999324e-01 9.17867392e-11 8.30390364e-13 2.57277805e-07 8.84035071e-12 5.11781429e-11 2.83346408e-11 4.18965301e-07 1.32796353e-13]] onnx predict_proba [[9.99998569e-01 7.81611026e-19 4.87444585e-10 1.79842026e-08 3.58700042e-10 1.18137689e-06 4.14409520e-08 1.48274751e-07 2.50162131e-08 5.51239410e-08] [1.37888807e-14 9.99999344e-01 9.17865159e-11 8.30387679e-13 2.57277748e-07 8.84032951e-12 5.11779785e-11 2.83345725e-11 4.18964021e-07 1.32796280e-13]] .. GENERATED FROM PYTHON SOURCE LINES 70-72 Comparing outputs +++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 72-76 .. code-block:: Python compare_objects(pipe.predict_proba(X_digits[:2]), onx_pred) # No exception so they are the same. .. GENERATED FROM PYTHON SOURCE LINES 77-79 Benchmarks ++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 79-91 .. code-block:: Python print("scikit-learn") print(timeit("pipe.predict_proba(X_digits[:1])", number=10000, globals=globals())) print("onnxruntime") print( timeit( "sess.run(None, {'input': X_digits[:1].astype(np.float32)})[1]", number=10000, globals=globals(), ) ) .. rst-class:: sphx-glr-script-out .. code-block:: none scikit-learn 2.0426334000003408 onnxruntime 0.2637577000004967 .. GENERATED FROM PYTHON SOURCE LINES 92-100 Intermediate steps ++++++++++++++++++ Let's imagine the final output is wrong and we need to look into each component of the pipeline which one is failing. The following method modifies the scikit-learn pipeline to steal the intermediate outputs and produces an smaller ONNX graph for every operator. .. GENERATED FROM PYTHON SOURCE LINES 100-144 .. code-block:: Python steps = collect_intermediate_steps(pipe, "pipeline", initial_types) assert len(steps) == 2 pipe.predict_proba(X_digits[:2]) for i, step in enumerate(steps): onnx_step = step["onnx_step"] sess = rt.InferenceSession( onnx_step.SerializeToString(), providers=["CPUExecutionProvider"] ) onnx_outputs = sess.run(None, {"input": X_digits[:2].astype(np.float32)}) skl_outputs = step["model"]._debug.outputs if "transform" in skl_outputs: compare_objects(skl_outputs["transform"], onnx_outputs[0]) print("benchmark", step["model"].__class__) print("scikit-learn") print( timeit( "step['model'].transform(X_digits[:1])", number=10000, globals=globals() ) ) else: compare_objects(skl_outputs["predict_proba"], onnx_outputs[1]) print("benchmark", step["model"].__class__) print("scikit-learn") print( timeit( "step['model'].predict_proba(X_digits[:1])", number=10000, globals=globals(), ) ) print("onnxruntime") print( timeit( "sess.run(None, {'input': X_digits[:1].astype(np.float32)})", number=10000, globals=globals(), ) ) .. rst-class:: sphx-glr-script-out .. code-block:: none benchmark scikit-learn 0.8991796999998769 onnxruntime 0.25503109999954177 benchmark scikit-learn 1.1041783999990002 onnxruntime 0.1891211000001931 .. GENERATED FROM PYTHON SOURCE LINES 145-146 **Versions used for this example** .. GENERATED FROM PYTHON SOURCE LINES 146-152 .. code-block:: Python print("numpy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) print("onnx: ", onnx.__version__) print("onnxruntime: ", rt.__version__) print("skl2onnx: ", skl2onnx.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none numpy: 1.26.4 scikit-learn: 1.6.dev0 onnx: 1.17.0 onnxruntime: 1.18.0+cu118 skl2onnx: 1.17.0 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.925 seconds) .. _sphx_glr_download_auto_examples_plot_benchmark_pipeline.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_benchmark_pipeline.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_benchmark_pipeline.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_