Probabilities or raw scores
===========================

A classifier usually returns a matrix of probabilities. By default,
*sklearn-onnx* creates an ONNX graph which returns probabilities but it may
skip that step and return raw scores if the model implements the method
*decision_function*. Option ``'raw_scores'`` is used to change the default
behaviour. Let's see that on a simple example.

Train a model and convert it
+++++++++++++++++++++++++++++

.. code-block:: default

    import numpy
    import sklearn
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    import onnxruntime as rt
    import onnx
    import skl2onnx
    from skl2onnx.common.data_types import FloatTensorType
    from skl2onnx import convert_sklearn
    from sklearn.linear_model import LogisticRegression

    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    clr = LogisticRegression(max_iter=500)
    clr.fit(X_train, y_train)
    print(clr)

    initial_type = [("float_input", FloatTensorType([None, 4]))]
    onx = convert_sklearn(clr, initial_types=initial_type, target_opset=12)

Output:

.. code-block:: none

    LogisticRegression(max_iter=500)

Output type
+++++++++++

Let's confirm the output type of the probabilities is a list of dictionaries
with onnxruntime.

.. code-block:: default

    sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])
    res = sess.run(None, {"float_input": X_test.astype(numpy.float32)})
    print("skl", clr.predict_proba(X_test[:1]))
    print("onnx", res[1][:2])

Output:

.. code-block:: none

    skl [[9.82794559e-01 1.72053489e-02 9.16403830e-08]]
    onnx [{0: 0.9827945232391357, 1: 0.017205340787768364, 2: 9.164028114128087e-08}, {0: 0.00189912598580122, 1: 0.4566256105899811, 2: 0.541475236415863}]

Raw scores and decision_function
++++++++++++++++++++++++++++++++

.. code-block:: default

    initial_type = [("float_input", FloatTensorType([None, 4]))]
    options = {id(clr): {"raw_scores": True}}
    onx2 = convert_sklearn(
        clr, initial_types=initial_type, options=options, target_opset=12
    )
    sess2 = rt.InferenceSession(
        onx2.SerializeToString(), providers=["CPUExecutionProvider"]
    )
    res2 = sess2.run(None, {"float_input": X_test.astype(numpy.float32)})
    print("skl", clr.decision_function(X_test[:1]))
    print("onnx", res2[1][:2])

Output:

.. code-block:: none

    skl [[ 6.74440614  2.69922635 -9.44363249]]
    onnx [{0: 6.744406700134277, 1: 2.6992263793945312, 2: -9.443633079528809}, {0: -3.7117910385131836, 1: 1.770678997039795, 2: 1.9411125183105469}]

**Versions used for this example**

.. code-block:: default

    print("numpy:", numpy.__version__)
    print("scikit-learn:", sklearn.__version__)
    print("onnx: ", onnx.__version__)
    print("onnxruntime: ", rt.__version__)
    print("skl2onnx: ", skl2onnx.__version__)

Output:

.. code-block:: none

    numpy: 1.23.5
    scikit-learn: 1.4.dev0
    onnx: 1.15.0
    onnxruntime: 1.16.0+cu118
    skl2onnx: 1.15.0