.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_pipeline_xgboost.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_pipeline_xgboost.py: Convert a pipeline with a XGBoost model ======================================== .. index:: XGBoost *sklearn-onnx* only converts *scikit-learn* models into *ONNX* but many libraries implement *scikit-learn* API so that their models can be included in a *scikit-learn* pipeline. This example considers a pipeline including a *XGBoost* model. *sklearn-onnx* can convert the whole pipeline as long as it knows the converter associated to a *XGBClassifier*. Let's see how to do it. Train a XGBoost classifier ++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 20-77 .. code-block:: Python import os import numpy import matplotlib.pyplot as plt import onnx from onnx.tools.net_drawer import GetPydotGraph, GetOpNodeProducer import onnxruntime as rt import sklearn from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler import xgboost from xgboost import XGBClassifier import skl2onnx from skl2onnx.common.data_types import FloatTensorType from skl2onnx import convert_sklearn, update_registered_converter from skl2onnx.common.shape_calculator import ( calculate_linear_classifier_output_shapes, ) import onnxmltools from onnxmltools.convert.xgboost.operator_converters.XGBoost import ( convert_xgboost, ) import onnxmltools.convert.common.data_types data = load_iris() X = data.data[:, :2] y = data.target ind = numpy.arange(X.shape[0]) numpy.random.shuffle(ind) X = X[ind, :].copy() y = y[ind].copy() pipe = Pipeline([("scaler", StandardScaler()), ("lgbm", XGBClassifier(n_estimators=3))]) pipe.fit(X, y) # The conversion fails but it is expected. try: convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, 2]))], target_opset={"": 12, "ai.onnx.ml": 2}, ) except Exception as e: print(e) # The error message tells no converter was found # for XGBoost models. By default, *sklearn-onnx* # only handles models from *scikit-learn* but it can # be extended to every model following *scikit-learn* # API as long as the module knows there exists a converter # for every model used in a pipeline. That's why # we need to register a converter. .. rst-class:: sphx-glr-script-out .. code-block:: none Unable to find a shape calculator for type ''. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented in sklearn-onnx. If the converted is implemented in another library, you need to register the converted so that it can be used by sklearn-onnx (function update_registered_converter). If the model is not yet covered by sklearn-onnx, you may raise an issue to https://github.com/onnx/sklearn-onnx/issues to get the converter implemented or even contribute to the project. If the model is a custom model, a new converter must be implemented. Examples can be found in the gallery. .. GENERATED FROM PYTHON SOURCE LINES 78-89 Register the converter for XGBClassifier ++++++++++++++++++++++++++++++++++++++++ The converter is implemented in *onnxmltools*: `onnxmltools...XGBoost.py `_. and the shape calculator: `onnxmltools...Classifier.py `_. .. GENERATED FROM PYTHON SOURCE LINES 91-92 Then we import the converter and shape calculator. .. GENERATED FROM PYTHON SOURCE LINES 94-95 Let's register the new converter. .. GENERATED FROM PYTHON SOURCE LINES 95-103 .. code-block:: Python update_registered_converter( XGBClassifier, "XGBoostXGBClassifier", calculate_linear_classifier_output_shapes, convert_xgboost, options={"nocl": [True, False], "zipmap": [True, False, "columns"]}, ) .. GENERATED FROM PYTHON SOURCE LINES 104-106 Convert again +++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 106-118 .. code-block:: Python model_onnx = convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, 2]))], target_opset={"": 12, "ai.onnx.ml": 2}, ) # And save. with open("pipeline_xgboost.onnx", "wb") as f: f.write(model_onnx.SerializeToString()) .. GENERATED FROM PYTHON SOURCE LINES 119-123 Compare the predictions +++++++++++++++++++++++ Predictions with XGBoost. .. GENERATED FROM PYTHON SOURCE LINES 123-127 .. code-block:: Python print("predict", pipe.predict(X[:5])) print("predict_proba", pipe.predict_proba(X[:1])) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [1 2 0 0 2] predict_proba [[0.16737716 0.48578635 0.34683657]] .. GENERATED FROM PYTHON SOURCE LINES 128-129 Predictions with onnxruntime. .. GENERATED FROM PYTHON SOURCE LINES 129-135 .. code-block:: Python sess = rt.InferenceSession("pipeline_xgboost.onnx", providers=["CPUExecutionProvider"]) pred_onx = sess.run(None, {"input": X[:5].astype(numpy.float32)}) print("predict", pred_onx[0]) print("predict_proba", pred_onx[1][:1]) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [1 2 0 0 2] predict_proba [{0: 0.16737714409828186, 1: 0.4857862889766693, 2: 0.34683650732040405}] .. GENERATED FROM PYTHON SOURCE LINES 136-138 Display the ONNX graph ++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 138-156 .. code-block:: Python pydot_graph = GetPydotGraph( model_onnx.graph, name=model_onnx.graph.name, rankdir="TB", node_producer=GetOpNodeProducer( "docstring", color="yellow", fillcolor="yellow", style="filled" ), ) pydot_graph.write_dot("pipeline.dot") os.system("dot -O -Gdpi=300 -Tpng pipeline.dot") image = plt.imread("pipeline.dot.png") fig, ax = plt.subplots(figsize=(40, 20)) ax.imshow(image) ax.axis("off") .. image-sg:: /auto_examples/images/sphx_glr_plot_pipeline_xgboost_001.png :alt: plot pipeline xgboost :srcset: /auto_examples/images/sphx_glr_plot_pipeline_xgboost_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none (np.float64(-0.5), np.float64(2485.5), np.float64(2558.5), np.float64(-0.5)) .. GENERATED FROM PYTHON SOURCE LINES 157-158 **Versions used for this example** .. GENERATED FROM PYTHON SOURCE LINES 158-166 .. code-block:: Python print("numpy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) print("onnx: ", onnx.__version__) print("onnxruntime: ", rt.__version__) print("skl2onnx: ", skl2onnx.__version__) print("onnxmltools: ", onnxmltools.__version__) print("xgboost: ", xgboost.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none numpy: 2.3.1 scikit-learn: 1.6.1 onnx: 1.19.0 onnxruntime: 1.23.0 skl2onnx: 1.19.1 onnxmltools: 1.14.0 xgboost: 3.0.2 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.342 seconds) .. _sphx_glr_download_auto_examples_plot_pipeline_xgboost.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_pipeline_xgboost.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_pipeline_xgboost.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_pipeline_xgboost.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_