.. GENERATED FROM PYTHON SOURCE LINES 56-60 Conversion to ONNX ++++++++++++++++++ Function *to_onnx* does not handle dataframes. .. GENERATED FROM PYTHON SOURCE LINES 60-64 .. code-block:: Python onx = to_onnx(pipe, train_data[:1], options={RandomForestClassifier: {"zipmap": False}}) .. GENERATED FROM PYTHON SOURCE LINES 65-69 Prediction with ONNX ++++++++++++++++++++ *onnxruntime* does not support dataframes. .. GENERATED FROM PYTHON SOURCE LINES 69-105 .. code-block:: Python sess = InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"]) try: sess.run(None, train_data) except Exception as e: print(e) # Unhide conversion logic with a dataframe # ++++++++++++++++++++++++++++++++++++++++ # # A dataframe can be seen as a set of columns with # different types. That's what ONNX should see: # a list of inputs, the input name is the column name, # the input type is the column type. def guess_schema_from_data(X): init = guess_initial_types(X) unique = set() for _, col in init: if len(col.shape) != 2: return init if col.shape[0] is not None: return init if len(unique) > 0 and col.__class__ not in unique: return init unique.add(col.__class__) unique = list(unique) return [("X", unique[0]([None, sum(_[1].shape[1] for _ in init)]))] init = guess_schema_from_data(train_data) pprint.pprint(init) .. rst-class:: sphx-glr-script-out .. code-block:: none run(): incompatible function arguments. The following argument types are supported: 1. (self: onnxruntime.capi.onnxruntime_pybind11_state.InferenceSession, arg0: list[str], arg1: dict[str, object], arg2: onnxruntime.capi.onnxruntime_pybind11_state.RunOptions) -> list Invoked with: , ['label', 'probabilities'], CAT1 CAT2 num1 num2 0 a c 0.50 0.60 1 b d 0.40 0.80 2 a d 0.50 0.56 3 a d 0.55 0.56 4 a c 0.35 0.86 5 a c 0.50 0.68, None [('CAT1', StringTensorType(shape=[None, 1])), ('CAT2', StringTensorType(shape=[None, 1])), ('num1', DoubleTensorType(shape=[None, 1])), ('num2', DoubleTensorType(shape=[None, 1]))] .. GENERATED FROM PYTHON SOURCE LINES 106-107 Let's use float instead. .. GENERATED FROM PYTHON SOURCE LINES 107-117 .. code-block:: Python for c in train_data.columns: if c not in cat_cols: train_data[c] = train_data[c].astype(numpy.float32) init = guess_schema_from_data(train_data) pprint.pprint(init) .. rst-class:: sphx-glr-script-out .. code-block:: none [('CAT1', StringTensorType(shape=[None, 1])), ('CAT2', StringTensorType(shape=[None, 1])), ('num1', FloatTensorType(shape=[None, 1])), ('num2', FloatTensorType(shape=[None, 1]))] .. GENERATED FROM PYTHON SOURCE LINES 118-119 Let's convert with *skl2onnx* only. .. GENERATED FROM PYTHON SOURCE LINES 119-124 .. code-block:: Python onx2 = to_onnx( pipe, initial_types=init, options={RandomForestClassifier: {"zipmap": False}} ) .. GENERATED FROM PYTHON SOURCE LINES 125-129 Let's run it with onnxruntime. We need to convert the dataframe into a dictionary where column names become keys, and column values become values. .. GENERATED FROM PYTHON SOURCE LINES 129-133 .. code-block:: Python inputs = {c: train_data[c].values.reshape((-1, 1)) for c in train_data.columns} pprint.pprint(inputs) .. rst-class:: sphx-glr-script-out .. code-block:: none {'CAT1': array([['a'], ['b'], ['a'], ['a'], ['a'], ['a']], dtype=object), 'CAT2': array([['c'], ['d'], ['d'], ['d'], ['c'], ['c']], dtype=object), 'num1': array([[0.5 ], [0.4 ], [0.5 ], [0.55], [0.35], [0.5 ]], dtype=float32), 'num2': array([[0.6 ], [0.8 ], [0.56], [0.56], [0.86], [0.68]], dtype=float32)} .. GENERATED FROM PYTHON SOURCE LINES 134-135 Inference. .. GENERATED FROM PYTHON SOURCE LINES 135-143 .. code-block:: Python sess2 = InferenceSession(onx2.SerializeToString(), providers=["CPUExecutionProvider"]) got2 = sess2.run(None, inputs) print(pipe.predict(train_data)) print(got2[0]) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 1 0 1 0 1] [0 1 0 1 0 1] .. GENERATED FROM PYTHON SOURCE LINES 144-145 And probilities. .. .. GENERATED FROM PYTHON SOURCE LINES 145-148 .. code-block:: Python print(pipe.predict_proba(train_data)) print(got2[1]) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0.84 0.16] [0.32 0.68] [0.68 0.32] [0.17 0.83] [0.77 0.23] [0.36 0.64]] [[0.84000003 0.16 ] [0.32000035 0.67999965] [0.68000007 0.31999996] [0.1700005 0.8299995 ] [0.77 0.23000003] [0.3600003 0.6399997 ]]