.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_tutorial/plot_gexternal_xgboost.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_tutorial_plot_gexternal_xgboost.py: Convert a pipeline with a XGBoost model ======================================== .. index:: XGBoost :epkg:`sklearn-onnx` only converts :epkg:`scikit-learn` models into :epkg:`ONNX` but many libraries implement :epkg:`scikit-learn` API so that their models can be included in a :epkg:`scikit-learn` pipeline. This example considers a pipeline including a :epkg:`XGBoost` model. :epkg:`sklearn-onnx` can convert the whole pipeline as long as it knows the converter associated to a *XGBClassifier*. Let's see how to do it. Train a XGBoost classifier ++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 21-70 .. code-block:: Python import numpy import onnxruntime as rt from sklearn.datasets import load_iris, load_diabetes, make_classification from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from xgboost import XGBClassifier, XGBRegressor, DMatrix, train as train_xgb from skl2onnx.common.data_types import FloatTensorType from skl2onnx import convert_sklearn, to_onnx, update_registered_converter from skl2onnx.common.shape_calculator import ( calculate_linear_classifier_output_shapes, calculate_linear_regressor_output_shapes, ) from onnxmltools.convert.xgboost.operator_converters.XGBoost import convert_xgboost from onnxmltools.convert import convert_xgboost as convert_xgboost_booster data = load_iris() X = data.data[:, :2] y = data.target ind = numpy.arange(X.shape[0]) numpy.random.shuffle(ind) X = X[ind, :].copy() y = y[ind].copy() pipe = Pipeline([("scaler", StandardScaler()), ("xgb", XGBClassifier(n_estimators=3))]) pipe.fit(X, y) # The conversion fails but it is expected. try: convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, 2]))], target_opset={"": 12, "ai.onnx.ml": 2}, ) except Exception as e: print(e) # The error message tells no converter was found # for :epkg:`XGBoost` models. By default, :epkg:`sklearn-onnx` # only handles models from :epkg:`scikit-learn` but it can # be extended to every model following :epkg:`scikit-learn` # API as long as the module knows there exists a converter # for every model used in a pipeline. That's why # we need to register a converter. .. GENERATED FROM PYTHON SOURCE LINES 71-82 Register the converter for XGBClassifier ++++++++++++++++++++++++++++++++++++++++ The converter is implemented in :epkg:`onnxmltools`: `onnxmltools...XGBoost.py `_. and the shape calculator: `onnxmltools...Classifier.py `_. .. GENERATED FROM PYTHON SOURCE LINES 82-91 .. code-block:: Python update_registered_converter( XGBClassifier, "XGBoostXGBClassifier", calculate_linear_classifier_output_shapes, convert_xgboost, options={"nocl": [True, False], "zipmap": [True, False, "columns"]}, ) .. GENERATED FROM PYTHON SOURCE LINES 92-94 Convert again +++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 94-106 .. code-block:: Python model_onnx = convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, 2]))], target_opset={"": 12, "ai.onnx.ml": 2}, ) # And save. with open("pipeline_xgboost.onnx", "wb") as f: f.write(model_onnx.SerializeToString()) .. GENERATED FROM PYTHON SOURCE LINES 107-111 Compare the predictions +++++++++++++++++++++++ Predictions with XGBoost. .. GENERATED FROM PYTHON SOURCE LINES 111-115 .. code-block:: Python print("predict", pipe.predict(X[:5])) print("predict_proba", pipe.predict_proba(X[:1])) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [0 2 1 1 2] predict_proba [[0.69600695 0.1526681 0.15132491]] .. GENERATED FROM PYTHON SOURCE LINES 116-117 Predictions with onnxruntime. .. GENERATED FROM PYTHON SOURCE LINES 117-123 .. code-block:: Python sess = rt.InferenceSession("pipeline_xgboost.onnx", providers=["CPUExecutionProvider"]) pred_onx = sess.run(None, {"input": X[:5].astype(numpy.float32)}) print("predict", pred_onx[0]) print("predict_proba", pred_onx[1][:1]) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [0 2 1 1 2] predict_proba [{0: 0.6960069537162781, 1: 0.15266810357570648, 2: 0.15132491290569305}] .. GENERATED FROM PYTHON SOURCE LINES 124-126 Same example with XGBRegressor ++++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 126-145 .. code-block:: Python update_registered_converter( XGBRegressor, "XGBoostXGBRegressor", calculate_linear_regressor_output_shapes, convert_xgboost, ) data = load_diabetes() x = data.data y = data.target X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5) pipe = Pipeline([("scaler", StandardScaler()), ("xgb", XGBRegressor(n_estimators=3))]) pipe.fit(X_train, y_train) print("predict", pipe.predict(X_test[:5])) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [167.95638 209.4882 112.31891 127.7238 126.65028] .. GENERATED FROM PYTHON SOURCE LINES 146-147 ONNX .. GENERATED FROM PYTHON SOURCE LINES 147-156 .. code-block:: Python onx = to_onnx( pipe, X_train.astype(numpy.float32), target_opset={"": 12, "ai.onnx.ml": 2} ) sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"]) pred_onx = sess.run(None, {"X": X_test[:5].astype(numpy.float32)}) print("predict", pred_onx[0].ravel()) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [167.95638 209.4882 112.31891 127.7238 126.65028] .. GENERATED FROM PYTHON SOURCE LINES 157-159 Some discrepencies may appear. In that case, you should read :ref:`l-example-discrepencies-float-double`. .. GENERATED FROM PYTHON SOURCE LINES 161-167 Same with a Booster +++++++++++++++++++ A booster cannot be inserted in a pipeline. It requires a different conversion function because it does not follow :epkg:`scikit-learn` API. .. GENERATED FROM PYTHON SOURCE LINES 167-195 .. code-block:: Python x, y = make_classification( n_classes=2, n_features=5, n_samples=100, random_state=42, n_informative=3 ) X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5, random_state=42) dtrain = DMatrix(X_train, label=y_train) param = {"objective": "multi:softmax", "num_class": 3} bst = train_xgb(param, dtrain, 10) initial_type = [("float_input", FloatTensorType([None, X_train.shape[1]]))] try: onx = convert_xgboost_booster(bst, "name", initial_types=initial_type) cont = True except AssertionError as e: print("XGBoost is too recent or onnxmltools too old.", e) cont = False if cont: sess = rt.InferenceSession( onx.SerializeToString(), providers=["CPUExecutionProvider"] ) input_name = sess.get_inputs()[0].name label_name = sess.get_outputs()[0].name pred_onx = sess.run([label_name], {input_name: X_test.astype(numpy.float32)})[0] print(pred_onx) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 0 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 0 0 0 1 0] .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.126 seconds) .. _sphx_glr_download_auto_tutorial_plot_gexternal_xgboost.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_gexternal_xgboost.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gexternal_xgboost.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_