Go to the end to download the full example code
Convert a pipeline with a XGBoost model#
sklearn-onnx only converts scikit-learn models into ONNX but many libraries implement scikit-learn API so that their models can be included in a scikit-learn pipeline. This example considers a pipeline including a XGBoost model. sklearn-onnx can convert the whole pipeline as long as it knows the converter associated to a XGBClassifier. Let’s see how to do it.
Train a XGBoost classifier#
import numpy
import onnxruntime as rt
from sklearn.datasets import load_iris, load_diabetes, make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier, XGBRegressor, DMatrix, train as train_xgb
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx import convert_sklearn, to_onnx, update_registered_converter
from skl2onnx.common.shape_calculator import (
from onnxmltools.convert.xgboost.operator_converters.XGBoost import convert_xgboost
from onnxmltools.convert import convert_xgboost as convert_xgboost_booster
data = load_iris()
X =[:, :2]
y =
ind = numpy.arange(X.shape[0])
X = X[ind, :].copy()
y = y[ind].copy()
pipe = Pipeline([("scaler", StandardScaler()), ("xgb", XGBClassifier(n_estimators=3))]), y)
# The conversion fails but it is expected.
[("input", FloatTensorType([None, 2]))],
target_opset={"": 12, "": 2},
except Exception as e:
# The error message tells no converter was found
# for :epkg:`XGBoost` models. By default, :epkg:`sklearn-onnx`
# only handles models from :epkg:`scikit-learn` but it can
# be extended to every model following :epkg:`scikit-learn`
# API as long as the module knows there exists a converter
# for every model used in a pipeline. That's why
# we need to register a converter.
Unable to find a shape calculator for type '<class 'xgboost.sklearn.XGBClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.
Register the converter for XGBClassifier#
The converter is implemented in onnxmltools: onnxmltools… and the shape calculator: onnxmltools…
options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
Convert again#
model_onnx = convert_sklearn(
[("input", FloatTensorType([None, 2]))],
target_opset={"": 12, "": 2},
# And save.
with open("pipeline_xgboost.onnx", "wb") as f:
Compare the predictions#
Predictions with XGBoost.
print("predict", pipe.predict(X[:5]))
print("predict_proba", pipe.predict_proba(X[:1]))
predict [0 2 2 1 0]
predict_proba [[0.6735462 0.16588391 0.16056988]]
Predictions with onnxruntime.
sess = rt.InferenceSession("pipeline_xgboost.onnx", providers=["CPUExecutionProvider"])
pred_onx =, {"input": X[:5].astype(numpy.float32)})
print("predict", pred_onx[0])
print("predict_proba", pred_onx[1][:1])
predict [0 2 2 1 0]
predict_proba [{0: 0.6735461950302124, 1: 0.16588392853736877, 2: 0.16056989133358002}]
Same example with XGBRegressor#
data = load_diabetes()
x =
y =
X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5)
pipe = Pipeline([("scaler", StandardScaler()), ("xgb", XGBRegressor(n_estimators=3))]), y_train)
print("predict", pipe.predict(X_test[:5]))
predict [ 50.63668 139.96634 89.57979 92.50203 34.25794]
onx = to_onnx(
pipe, X_train.astype(numpy.float32), target_opset={"": 12, "": 2}
sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])
pred_onx =, {"X": X_test[:5].astype(numpy.float32)})
print("predict", pred_onx[0].ravel())
predict [ 50.63668 139.96634 89.57979 92.50203 34.25794]
Some discrepencies may appear. In that case, you should read Issues when switching to float.
Same with a Booster#
A booster cannot be inserted in a pipeline. It requires a different conversion function because it does not follow scikit-learn API.
x, y = make_classification(
n_classes=2, n_features=5, n_samples=100, random_state=42, n_informative=3
X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5, random_state=42)
dtrain = DMatrix(X_train, label=y_train)
param = {"objective": "multi:softmax", "num_class": 3}
bst = train_xgb(param, dtrain, 10)
initial_type = [("float_input", FloatTensorType([None, X_train.shape[1]]))]
onx = convert_xgboost_booster(bst, "name", initial_types=initial_type)
cont = True
except AssertionError as e:
print("XGBoost is too recent or onnxmltools too old.", e)
cont = False
if cont:
sess = rt.InferenceSession(
onx.SerializeToString(), providers=["CPUExecutionProvider"]
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
pred_onx =[label_name], {input_name: X_test.astype(numpy.float32)})[0]
[0 0 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 1
0 1 1 1 0 0 1 1 0 0 0 1 0]
Total running time of the script: (0 minutes 0.185 seconds)