.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_gpr.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_gpr.py: .. _l-gpr-example: Discrepencies with GaussianProcessorRegressor: use of double ============================================================ The `GaussianProcessRegressor `_ involves many matrix operations which may requires double precisions. *sklearn-onnx* is using single floats by default but for this particular model, it is better to use double. Let's see how to create an ONNX file using doubles. Train a model +++++++++++++ A very basic example using *GaussianProcessRegressor* on the Boston dataset. .. GENERATED FROM PYTHON SOURCE LINES 24-45 .. code-block:: Python import pprint import numpy import sklearn from sklearn.datasets import load_diabetes from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import DotProduct, RBF from sklearn.model_selection import train_test_split import onnx import onnxruntime as rt import skl2onnx from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType from skl2onnx import convert_sklearn dataset = load_diabetes() X, y = dataset.data, dataset.target X_train, X_test, y_train, y_test = train_test_split(X, y) gpr = GaussianProcessRegressor(DotProduct() + RBF(), alpha=1.0) gpr.fit(X_train, y_train) print(gpr) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/xadupre/vv/this312/lib/python3.12/site-packages/sklearn/gaussian_process/_gpr.py:660: ConvergenceWarning: lbfgs failed to converge (status=2): ABNORMAL_TERMINATION_IN_LNSRCH. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html _check_optimize_result("lbfgs", opt_res) /home/xadupre/vv/this312/lib/python3.12/site-packages/sklearn/gaussian_process/kernels.py:442: ConvergenceWarning: The optimal value found for dimension 0 of parameter k2__length_scale is close to the specified lower bound 1e-05. Decreasing the bound and calling fit again may find a better value. warnings.warn( GaussianProcessRegressor(alpha=1.0, kernel=DotProduct(sigma_0=1) + RBF(length_scale=1)) .. GENERATED FROM PYTHON SOURCE LINES 46-51 First attempt to convert a model into ONNX ++++++++++++++++++++++++++++++++++++++++++ The documentation suggests the following way to convert a model into ONNX. .. GENERATED FROM PYTHON SOURCE LINES 51-61 .. code-block:: Python initial_type = [("X", FloatTensorType([None, X_train.shape[1]]))] onx = convert_sklearn(gpr, initial_types=initial_type, target_opset=12) sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"]) try: pred_onx = sess.run(None, {"X": X_test.astype(numpy.float32)})[0] except RuntimeError as e: print(str(e)) .. GENERATED FROM PYTHON SOURCE LINES 62-74 Second attempt: variable dimensions +++++++++++++++++++++++++++++++++++ Unfortunately, even though the conversion went well, the runtime fails to compute the prediction. The previous snippet of code imposes fixed dimension on the input and therefore let the runtime assume every node output has outputs with fixed dimensions And that's not the case for this model. We need to disable these checkings by replacing the fixed dimensions by an empty value. (see next line). .. GENERATED FROM PYTHON SOURCE LINES 74-85 .. code-block:: Python initial_type = [("X", FloatTensorType([None, None]))] onx = convert_sklearn(gpr, initial_types=initial_type, target_opset=12) sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"]) pred_onx = sess.run(None, {"X": X_test.astype(numpy.float32)})[0] pred_skl = gpr.predict(X_test) print(pred_skl[:10]) print(pred_onx[0, :10]) .. rst-class:: sphx-glr-script-out .. code-block:: none [199.16400146 144.28936768 109.57861328 183.99377441 169.99847412 112.01519775 179.21099854 130.21044922 189.19836426 136.98278809] [557056.] .. GENERATED FROM PYTHON SOURCE LINES 86-89 The differences seems quite important. Let's confirm that by looking at the biggest differences. .. GENERATED FROM PYTHON SOURCE LINES 89-94 .. code-block:: Python diff = numpy.sort(numpy.abs(numpy.squeeze(pred_skl) - numpy.squeeze(pred_onx)))[-5:] print(diff) print("min(Y)-max(Y):", min(y_test), max(y_test)) .. rst-class:: sphx-glr-script-out .. code-block:: none [556952.34741211 556953.92102051 556957.08154297 556958.05334473 556961.00817871] min(Y)-max(Y): 39.0 341.0 .. GENERATED FROM PYTHON SOURCE LINES 95-111 Third attempt: use of double ++++++++++++++++++++++++++++ The model uses a couple of matrix computations and matrices have coefficients with very different order of magnitude. It is difficult to approximate the prediction made with scikit-learn if the converted model sticks to float. Double precision is needed. The previous code requires two changes. The first one indicates that inputs are now of type ``DoubleTensorType``. The second change is the extra parameter ``dtype=numpy.float64`` tells the conversion function that every real constant matrix such as the trained coefficients will be dumped as doubles and not as floats anymore. .. GENERATED FROM PYTHON SOURCE LINES 111-122 .. code-block:: Python initial_type = [("X", DoubleTensorType([None, None]))] onx64 = convert_sklearn(gpr, initial_types=initial_type, target_opset=12) sess64 = rt.InferenceSession( onx64.SerializeToString(), providers=["CPUExecutionProvider"] ) pred_onx64 = sess64.run(None, {"X": X_test})[0] print(pred_onx64[0, :10]) .. rst-class:: sphx-glr-script-out .. code-block:: none [199.16298882] .. GENERATED FROM PYTHON SOURCE LINES 123-124 The new differences look much better. .. GENERATED FROM PYTHON SOURCE LINES 124-129 .. code-block:: Python diff = numpy.sort(numpy.abs(numpy.squeeze(pred_skl) - numpy.squeeze(pred_onx64)))[-5:] print(diff) print("min(Y)-max(Y):", min(y_test), max(y_test)) .. rst-class:: sphx-glr-script-out .. code-block:: none [0.00544517 0.0063448 0.00640814 0.00701725 0.00797183] min(Y)-max(Y): 39.0 341.0 .. GENERATED FROM PYTHON SOURCE LINES 130-136 Size increase +++++++++++++ As a result, the ONNX model is almost twice bigger because every coefficient is stored as double and and not as floats anymore. .. GENERATED FROM PYTHON SOURCE LINES 136-142 .. code-block:: Python size32 = len(onx.SerializeToString()) size64 = len(onx64.SerializeToString()) print("ONNX with floats:", size32) print("ONNX with doubles:", size64) .. rst-class:: sphx-glr-script-out .. code-block:: none ONNX with floats: 29226 ONNX with doubles: 57050 .. GENERATED FROM PYTHON SOURCE LINES 143-154 return_std=True +++++++++++++++ `GaussianProcessRegressor `_ is one model which defined additional parameter to the predict function. If call with ``return_std=True``, the class returns one more results and that needs to be reflected into the generated ONNX graph. The converter needs to know that an extended graph is required. That's done through the option mechanism (see :ref:`l-conv-options`). .. GENERATED FROM PYTHON SOURCE LINES 154-164 .. code-block:: Python initial_type = [("X", DoubleTensorType([None, None]))] options = {GaussianProcessRegressor: {"return_std": True}} try: onx64_std = convert_sklearn( gpr, initial_types=initial_type, options=options, target_opset=12 ) except RuntimeError as e: print(e) .. GENERATED FROM PYTHON SOURCE LINES 165-169 This error highlights the fact that the *scikit-learn* computes internal variables on first call to method predict. The converter needs them to be initialized by calling method predict at least once and then converting again. .. GENERATED FROM PYTHON SOURCE LINES 169-182 .. code-block:: Python gpr.predict(X_test[:1], return_std=True) onx64_std = convert_sklearn( gpr, initial_types=initial_type, options=options, target_opset=12 ) sess64_std = rt.InferenceSession( onx64_std.SerializeToString(), providers=["CPUExecutionProvider"] ) pred_onx64_std = sess64_std.run(None, {"X": X_test[:5]}) pprint.pprint(pred_onx64_std) .. rst-class:: sphx-glr-script-out .. code-block:: none [array([[199.16298882], [144.28235043], [109.58064467], [183.99437845], [169.99982479]]), array([668.01881677, 785.8806495 , 561.56004495, 541.05432348, 0. ])] .. GENERATED FROM PYTHON SOURCE LINES 183-184 Let's compare with *scikit-learn* prediction. .. GENERATED FROM PYTHON SOURCE LINES 184-187 .. code-block:: Python pprint.pprint(gpr.predict(X_test[:5], return_std=True)) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([199.16400146, 144.28936768, 109.57861328, 183.99377441, 169.99798584]), array([1.01514412, 1.00504366, 1.01094833, 1.01163485, 1.0076308 ])) .. GENERATED FROM PYTHON SOURCE LINES 188-189 It looks good. Let's do a better checks. .. GENERATED FROM PYTHON SOURCE LINES 189-200 .. code-block:: Python pred_onx64_std = sess64_std.run(None, {"X": X_test}) pred_std = gpr.predict(X_test, return_std=True) diff = numpy.sort( numpy.abs(numpy.squeeze(pred_onx64_std[1]) - numpy.squeeze(pred_std[1])) )[-5:] print(diff) .. rst-class:: sphx-glr-script-out .. code-block:: none [ 952.49009449 969.80341373 995.20442978 1069.49191637 1079.44581069] .. GENERATED FROM PYTHON SOURCE LINES 201-204 There are some discrepencies but it seems reasonable. **Versions used for this example** .. GENERATED FROM PYTHON SOURCE LINES 204-210 .. code-block:: Python print("numpy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) print("onnx: ", onnx.__version__) print("onnxruntime: ", rt.__version__) print("skl2onnx: ", skl2onnx.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none numpy: 2.2.0 scikit-learn: 1.6.0 onnx: 1.18.0 onnxruntime: 1.21.0+cu126 skl2onnx: 1.18.0 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 5.965 seconds) .. _sphx_glr_download_auto_examples_plot_gpr.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_gpr.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gpr.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_gpr.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_