.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_benchmark_pipeline.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_benchmark_pipeline.py: Benchmark a pipeline ==================== The following example checks up on every step in a pipeline, compares and benchmarks the predictions. Create a pipeline +++++++++++++++++ We reuse the pipeline implemented in example `Pipelining: chaining a PCA and a logistic regression `_. There is one change because `ONNX-ML Imputer `_ does not handle string type. This cannot be part of the final ONNX pipeline and must be removed. Look for comment starting with ``---`` below. .. GENERATED FROM PYTHON SOURCE LINES 23-52 .. code-block:: Python import skl2onnx import onnx import sklearn import numpy from skl2onnx.helpers import collect_intermediate_steps from timeit import timeit from skl2onnx.helpers import compare_objects import onnxruntime as rt from onnxconverter_common.data_types import FloatTensorType from skl2onnx import convert_sklearn import numpy as np import pandas as pd from sklearn import datasets from sklearn.decomposition import PCA from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline logistic = LogisticRegression() pca = PCA() pipe = Pipeline(steps=[("pca", pca), ("logistic", logistic)]) digits = datasets.load_digits() X_digits = digits.data[:1000] y_digits = digits.target[:1000] pipe.fit(X_digits, y_digits) .. raw:: html
Pipeline(steps=[('pca', PCA()), ('logistic', LogisticRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 53-55 Conversion to ONNX ++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 55-70 .. code-block:: Python initial_types = [("input", FloatTensorType((None, X_digits.shape[1])))] model_onnx = convert_sklearn(pipe, initial_types=initial_types, target_opset=12) sess = rt.InferenceSession( model_onnx.SerializeToString(), providers=["CPUExecutionProvider"] ) print("skl predict_proba") print(pipe.predict_proba(X_digits[:2])) onx_pred = sess.run(None, {"input": X_digits[:2].astype(np.float32)})[1] df = pd.DataFrame(onx_pred) print("onnx predict_proba") print(df.values) .. rst-class:: sphx-glr-script-out .. code-block:: none skl predict_proba [[9.99998530e-01 7.81608915e-19 4.87445983e-10 1.79842282e-08 3.58700553e-10 1.18138026e-06 4.14411050e-08 1.48275026e-07 2.50162856e-08 5.51240033e-08] [1.37889361e-14 9.99999324e-01 9.17867405e-11 8.30390363e-13 2.57277806e-07 8.84035067e-12 5.11781433e-11 2.83346409e-11 4.18965301e-07 1.32796354e-13]] onnx predict_proba [[9.99998569e-01 7.81611026e-19 4.87444585e-10 1.79842026e-08 3.58700042e-10 1.18137689e-06 4.14409520e-08 1.48274751e-07 2.50162131e-08 5.51239410e-08] [1.37888807e-14 9.99999344e-01 9.17865159e-11 8.30387679e-13 2.57277748e-07 8.84032951e-12 5.11779785e-11 2.83345725e-11 4.18964021e-07 1.32796280e-13]] .. GENERATED FROM PYTHON SOURCE LINES 71-73 Comparing outputs +++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 73-77 .. code-block:: Python compare_objects(pipe.predict_proba(X_digits[:2]), onx_pred) # No exception so they are the same. .. GENERATED FROM PYTHON SOURCE LINES 78-80 Benchmarks ++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 80-92 .. code-block:: Python print("scikit-learn") print(timeit("pipe.predict_proba(X_digits[:1])", number=10000, globals=globals())) print("onnxruntime") print( timeit( "sess.run(None, {'input': X_digits[:1].astype(np.float32)})[1]", number=10000, globals=globals(), ) ) .. rst-class:: sphx-glr-script-out .. code-block:: none scikit-learn 2.1023744829981297 onnxruntime 0.12994074699963676 .. GENERATED FROM PYTHON SOURCE LINES 93-101 Intermediate steps ++++++++++++++++++ Let's imagine the final output is wrong and we need to look into each component of the pipeline which one is failing. The following method modifies the scikit-learn pipeline to steal the intermediate outputs and produces an smaller ONNX graph for every operator. .. GENERATED FROM PYTHON SOURCE LINES 101-145 .. code-block:: Python steps = collect_intermediate_steps(pipe, "pipeline", initial_types) assert len(steps) == 2 pipe.predict_proba(X_digits[:2]) for _i, step in enumerate(steps): onnx_step = step["onnx_step"] sess = rt.InferenceSession( onnx_step.SerializeToString(), providers=["CPUExecutionProvider"] ) onnx_outputs = sess.run(None, {"input": X_digits[:2].astype(np.float32)}) skl_outputs = step["model"]._debug.outputs if "transform" in skl_outputs: compare_objects(skl_outputs["transform"], onnx_outputs[0]) print("benchmark", step["model"].__class__) print("scikit-learn") print( timeit( "step['model'].transform(X_digits[:1])", number=10000, globals=globals() ) ) else: compare_objects(skl_outputs["predict_proba"], onnx_outputs[1]) print("benchmark", step["model"].__class__) print("scikit-learn") print( timeit( "step['model'].predict_proba(X_digits[:1])", number=10000, globals=globals(), ) ) print("onnxruntime") print( timeit( "sess.run(None, {'input': X_digits[:1].astype(np.float32)})", number=10000, globals=globals(), ) ) .. rst-class:: sphx-glr-script-out .. code-block:: none benchmark scikit-learn 0.6131834360021458 onnxruntime 0.07247600700065959 benchmark scikit-learn 0.8395350790015073 onnxruntime 0.10347538899804931 .. GENERATED FROM PYTHON SOURCE LINES 146-147 **Versions used for this example** .. GENERATED FROM PYTHON SOURCE LINES 147-153 .. code-block:: Python print("numpy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) print("onnx: ", onnx.__version__) print("onnxruntime: ", rt.__version__) print("skl2onnx: ", skl2onnx.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none numpy: 2.2.0 scikit-learn: 1.6.0 onnx: 1.18.0 onnxruntime: 1.21.0+cu126 skl2onnx: 1.18.0 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.029 seconds) .. _sphx_glr_download_auto_examples_plot_benchmark_pipeline.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_benchmark_pipeline.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_benchmark_pipeline.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_benchmark_pipeline.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_