Discrepencies with GaussianProcessorRegressor: use of double

The GaussianProcessRegressor involves many matrix operations which may requires double precisions. sklearn-onnx is using single floats by default but for this particular model, it is better to use double. Let’s see how to create an ONNX file using doubles.

Train a model

A very basic example using GaussianProcessRegressor on the Boston dataset.

import pprint
import numpy
import sklearn
from sklearn.datasets import load_diabetes
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, RBF
from sklearn.model_selection import train_test_split
import onnx
import onnxruntime as rt
import skl2onnx
from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType
from skl2onnx import convert_sklearn

dataset = load_diabetes()
X, y = dataset.data, dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
gpr = GaussianProcessRegressor(DotProduct() + RBF(), alpha=1.0)
gpr.fit(X_train, y_train)
print(gpr)
/home/xadupre/vv/this312/lib/python3.12/site-packages/sklearn/gaussian_process/_gpr.py:660: ConvergenceWarning: lbfgs failed to converge (status=2):
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  _check_optimize_result("lbfgs", opt_res)
/home/xadupre/vv/this312/lib/python3.12/site-packages/sklearn/gaussian_process/kernels.py:442: ConvergenceWarning: The optimal value found for dimension 0 of parameter k2__length_scale is close to the specified lower bound 1e-05. Decreasing the bound and calling fit again may find a better value.
  warnings.warn(
GaussianProcessRegressor(alpha=1.0,
                         kernel=DotProduct(sigma_0=1) + RBF(length_scale=1))

First attempt to convert a model into ONNX

The documentation suggests the following way to convert a model into ONNX.

initial_type = [("X", FloatTensorType([None, X_train.shape[1]]))]
onx = convert_sklearn(gpr, initial_types=initial_type, target_opset=12)

sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])
try:
    pred_onx = sess.run(None, {"X": X_test.astype(numpy.float32)})[0]
except RuntimeError as e:
    print(str(e))

Second attempt: variable dimensions

Unfortunately, even though the conversion went well, the runtime fails to compute the prediction. The previous snippet of code imposes fixed dimension on the input and therefore let the runtime assume every node output has outputs with fixed dimensions And that’s not the case for this model. We need to disable these checkings by replacing the fixed dimensions by an empty value. (see next line).

initial_type = [("X", FloatTensorType([None, None]))]
onx = convert_sklearn(gpr, initial_types=initial_type, target_opset=12)

sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])
pred_onx = sess.run(None, {"X": X_test.astype(numpy.float32)})[0]

pred_skl = gpr.predict(X_test)
print(pred_skl[:10])
print(pred_onx[0, :10])
[199.16400146 144.28936768 109.57861328 183.99377441 169.99847412
 112.01519775 179.21099854 130.21044922 189.19836426 136.98278809]
[557056.]

The differences seems quite important. Let’s confirm that by looking at the biggest differences.

diff = numpy.sort(numpy.abs(numpy.squeeze(pred_skl) - numpy.squeeze(pred_onx)))[-5:]
print(diff)
print("min(Y)-max(Y):", min(y_test), max(y_test))
[556952.34741211 556953.92102051 556957.08154297 556958.05334473
 556961.00817871]
min(Y)-max(Y): 39.0 341.0

Third attempt: use of double

The model uses a couple of matrix computations and matrices have coefficients with very different order of magnitude. It is difficult to approximate the prediction made with scikit-learn if the converted model sticks to float. Double precision is needed.

The previous code requires two changes. The first one indicates that inputs are now of type DoubleTensorType. The second change is the extra parameter dtype=numpy.float64 tells the conversion function that every real constant matrix such as the trained coefficients will be dumped as doubles and not as floats anymore.

initial_type = [("X", DoubleTensorType([None, None]))]
onx64 = convert_sklearn(gpr, initial_types=initial_type, target_opset=12)

sess64 = rt.InferenceSession(
    onx64.SerializeToString(), providers=["CPUExecutionProvider"]
)
pred_onx64 = sess64.run(None, {"X": X_test})[0]

print(pred_onx64[0, :10])
[199.16298882]

The new differences look much better.

diff = numpy.sort(numpy.abs(numpy.squeeze(pred_skl) - numpy.squeeze(pred_onx64)))[-5:]
print(diff)
print("min(Y)-max(Y):", min(y_test), max(y_test))
[0.00544517 0.0063448  0.00640814 0.00701725 0.00797183]
min(Y)-max(Y): 39.0 341.0

Size increase

As a result, the ONNX model is almost twice bigger because every coefficient is stored as double and and not as floats anymore.

size32 = len(onx.SerializeToString())
size64 = len(onx64.SerializeToString())
print("ONNX with floats:", size32)
print("ONNX with doubles:", size64)
ONNX with floats: 29226
ONNX with doubles: 57050

return_std=True

GaussianProcessRegressor is one model which defined additional parameter to the predict function. If call with return_std=True, the class returns one more results and that needs to be reflected into the generated ONNX graph. The converter needs to know that an extended graph is required. That’s done through the option mechanism (see Converters with options).

initial_type = [("X", DoubleTensorType([None, None]))]
options = {GaussianProcessRegressor: {"return_std": True}}
try:
    onx64_std = convert_sklearn(
        gpr, initial_types=initial_type, options=options, target_opset=12
    )
except RuntimeError as e:
    print(e)

This error highlights the fact that the scikit-learn computes internal variables on first call to method predict. The converter needs them to be initialized by calling method predict at least once and then converting again.

gpr.predict(X_test[:1], return_std=True)
onx64_std = convert_sklearn(
    gpr, initial_types=initial_type, options=options, target_opset=12
)

sess64_std = rt.InferenceSession(
    onx64_std.SerializeToString(), providers=["CPUExecutionProvider"]
)
pred_onx64_std = sess64_std.run(None, {"X": X_test[:5]})

pprint.pprint(pred_onx64_std)
[array([[199.16298882],
       [144.28235043],
       [109.58064467],
       [183.99437845],
       [169.99982479]]),
 array([668.01881677, 785.8806495 , 561.56004495, 541.05432348,
         0.        ])]

Let’s compare with scikit-learn prediction.

pprint.pprint(gpr.predict(X_test[:5], return_std=True))
(array([199.16400146, 144.28936768, 109.57861328, 183.99377441,
       169.99798584]),
 array([1.01514412, 1.00504366, 1.01094833, 1.01163485, 1.0076308 ]))

It looks good. Let’s do a better checks.

pred_onx64_std = sess64_std.run(None, {"X": X_test})
pred_std = gpr.predict(X_test, return_std=True)


diff = numpy.sort(
    numpy.abs(numpy.squeeze(pred_onx64_std[1]) - numpy.squeeze(pred_std[1]))
)[-5:]
print(diff)
[ 952.49009449  969.80341373  995.20442978 1069.49191637 1079.44581069]

There are some discrepencies but it seems reasonable.

Versions used for this example

print("numpy:", numpy.__version__)
print("scikit-learn:", sklearn.__version__)
print("onnx: ", onnx.__version__)
print("onnxruntime: ", rt.__version__)
print("skl2onnx: ", skl2onnx.__version__)
numpy: 2.2.0
scikit-learn: 1.6.0
onnx:  1.18.0
onnxruntime:  1.21.0+cu126
skl2onnx:  1.18.0

Total running time of the script: (0 minutes 5.965 seconds)

Gallery generated by Sphinx-Gallery