Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
View the Project on GitHub onnx/onnx-mlir
This project is maintained by onnx
Hosted on GitHub Pages — Theme by orderedlist
Onnx-mlir has runtime utilities to compile and run ONNX models in Python.
These utilities are implemented by the OMCompile compiler interface
(src/Runtime/OMCompile.h) and the ExecutionSession class
(src/Runtime/ExecutionSession.hpp).
Both utilities have an associated Python binding generated by pybind library.
Using pybind, a C/C++ binary can be directly imported by the Python interpreter. For onnx-mlir, there are five such libraries, one to compile onnx-mlir models, two to run the models and the other two are to compile and run the models.
PyExecutionSession (src/Runtime/PyExecutionSession.hpp) and built as a shared library to build/Debug/lib/PyRuntimeC.cpython-<target>.so.PyOMCompile (src/Runtime/PyOMCompile.hpp) and built as a shared library to build/Debug/lib/PyOMCompileC.cpython-<target>.so.The module can be imported normally by the Python interpreter as long as it is in your PYTHONPATH. Another alternative is to create a symbolic link to it in your working directory.
cd <working directory>
ln -s <path to the shared library to run onnx-mlir models>(e.g. `build/Debug/lib/PyRuntimeC.cpython-<target>.so`) .
ln -s <path to the Python library to run onnx-mlir models>(e.g. src/Runtime/python/PyRuntime.py) .
ln -s <path to the shared library to compile onnx-mlir models>(e.g. `build/Debug/lib/PyOMCompileC.cpython-<target>.so`) .
ln -s <path to the Python library to compile run onnx-mlir models>(e.g. src/Runtime/python/PyOMCompile.py) .
python3
An ONNX model is a computation graph and it is often the case that the graph has a single entry point to trigger the computation. Below is an example of doing inference for a model that has a single entry point.
import numpy as np
from PyRuntime import OMExecutionSession
model = 'model.so' # LeNet from ONNX Zoo compiled with onnx-mlir
# Create a session for this model.
session = OMExecutionSession(shared_lib_path=model)
# Input and output signatures of the default entry point.
print("input signature in json", session.input_signature())
print("output signature in json", session.output_signature())
# Do inference using the default entry point.
a = np.full((1, 1, 28, 28), 1, np.dtype(np.float32))
try:
outputs = session.run(inputs=[a])
for output in outputs:
print(output.shape)
except RuntimeError as e:
print(f"Inference failed: {e}")
In case a computation graph has multiple entry points, users have to set a specific entry point to do inference. Below is an example of doing inference with multiple entry points.
import numpy as np
from PyRuntime import OMExecutionSession
model = 'multi-entry-points-model.so'
# Create a session for this model.
session = OMExecutionSession(shared_lib_path=model, use_default_entry_point=False) # False to manually set an entry point.
# Query entry points in the model.
entry_points = session.entry_points()
for entry_point in entry_points:
# Set the entry point to do inference.
session.set_entry_point(name=entry_point)
# Input and output signatures of the current entry point.
print("input signature in json", session.input_signature())
print("output signature in json", session.output_signature())
# Do inference using the current entry point.
a = np.arange(10).astype('float32')
b = np.arange(10).astype('float32')
try:
outputs = session.run(inputs=[a, b])
for output in outputs:
print(output.shape)
except RuntimeError as e:
print(f"Inference failed: {e}")
If a model was compiled by using --tag, the value of --tag must be passed to OMExecutionSession.
Using tags is useful when there are multiple sessions for multiple models in the same python script.
Below is an example of doing multiple inferences using tags.
import numpy as np
from PyRuntime import OMExecutionSession
encoder_model = 'encoder/model.so' # Assumed that the model was compiled using `--tag=encoder`
decoder_model = 'decoder/model.so' # Assumed that the model was compiled using `--tag=decoder`
# Create a session for the encoder model.
encoder_sess = OMExecutionSession(shared_lib_path=encoder_model, tag="encoder")
# Create a session for the decoder model.
decoder_sess = OMExecutionSession(shared_lib_path=decoder_model, tag="decoder")
In case two models were NOT compiled by using --tag, they must be compiled
with different .so filenames if they are to be used in the same process. Indeed,
when no tags are given, we use the file name as its default tag.
Below is an example of doing multiple inferences without using tags.
import numpy as np
from PyRuntime import OMExecutionSession
encoder_model = 'my_encoder.so'
decoder_model = 'my_decoder.so'
# Create a session for the encoder model.
encoder_sess = OMExecutionSession(shared_lib_path=encoder_model) # tag will be `my_encoder` by default.
# Create a session for the decoder model.
decoder_sess = OMExecutionSession(shared_lib_path=decoder_model) # tag will be `my_decoder` by default.
To use functions without tags, e.g. run_main_graph, set tag = "NONE".
The complete interface to OMExecutionSession can be seen in the sources mentioned previously.
However, using the constructor and run method is enough to perform inferences.
For detailed API documentation with all parameters, examples, and error handling, use Python’s help system:
from PyRuntime import OMExecutionSession
help(OMExecutionSession)
help(OMExecutionSession.run)
def __init__(self, shared_lib_path: str, tag: str = "", use_default_entry_point: bool = True):
"""
Args:
shared_lib_path: Relative or absolute path to your .so model.
tag: A string that was passed to `--tag` when compiling the .so model.
By default, it is the output file name without its extension,
namely, `filename` in `filename.so`.
use_default_entry_point: Use the default entry point ('run_main_graph')
or not. Set to True by default.
"""
def run(self, inputs: List[ndarray]) -> List[ndarray]:
"""
Run inference on the model.
Args:
inputs: A list of NumPy arrays, the inputs of your model.
Returns:
A list of NumPy arrays, the outputs of your model.
"""
def run_debug(self, inputs: List[ndarray],
with_signal_handler: bool = False,
force_output_data_copy: bool = False) -> List[ndarray]:
"""
Run inference with debugging options enabled.
Args:
inputs: A list of NumPy arrays, the inputs of your model.
with_signal_handler: Enable signal handler for catching crashes (POSIX only).
force_output_data_copy: Force copying of output data (for debugging).
Returns:
A list of NumPy arrays, the outputs of your model.
"""
def input_signature(self) -> str:
"""
Returns:
A string containing a JSON representation of the model's input signature.
"""
def output_signature(self) -> str:
"""
Returns:
A string containing a JSON representation of the model's output signature.
"""
def entry_points(self) -> List[str]:
"""
Returns:
A list of entry point names available in the model.
"""
def set_entry_point(self, name: str):
"""
Set the active entry point for inference.
Args:
name: An entry point name from entry_points().
"""
def print_instrumentation(self):
"""
Print instrumentation data from model execution.
If the model was compiled with instrumentation enabled, prints performance
metrics and profiling information. Does nothing if instrumentation is not available.
"""
An ONNX model can be compiled directly from the command line. The resulting library can then be executed using Python as shown in the previous sections. At times, it might be convenient to also compile a model directly in Python. This section explores the Python methods to do so.
The OMCompile constructor takes the input model path and compilation flags. Compilation happens during object construction.
import numpy as np
from PyOMCompile import OMCompile
# Load onnx model, compile, and create OMCompile object.
try:
compiler = OMCompile('./mnist.onnx', '-O3 -o mnist')
except RuntimeError as e:
print(f"Compilation failed: {e}")
exit(1)
# Get the output file name.
compiled_model = compiler.get_output_file_name()
print("Compiled onnx file to", compiled_model)
The PyOMCompile module exports the OMCompile class to drive the
compilation of an ONNX model into an executable model.
The compiler object is created by providing the input model file name and compilation flags.
Compilation occurs during construction, and the resulting output file name can be retrieved
using the get_output_file_name() method. Because different Operating Systems may have
different suffixes for libraries, always use this method to get the actual output filename.
The complete interface to OMCompile can be seen in the sources mentioned previously. However, using the constructor and the methods below are enough to compile models.
For detailed API documentation with all parameters, examples, and error handling, use Python’s help system:
from PyOMCompile import OMCompile
help(OMCompile)
def __init__(self, input_model_path: str, flags: str,
compiler_path: str = "", log_file_name: str = "",
reuse_compiled_model: bool = False):
"""
Compile an ONNX model.
Args:
input_model_path: Relative or absolute path to your ONNX/MLIR model file.
flags: Compilation flags as a single string (e.g., '-O3', '-O3 -o output_name').
compiler_path: Path to onnx-mlir compiler binary. If empty, use default location.
log_file_name: Path to log file for compilation output. If empty, output to stdout/stderr.
reuse_compiled_model: If True, reuse existing compiled model if it exists. Default: False.
Raises:
RuntimeError: If the model file doesn't exist, compilation fails, or no input model provided.
"""
def get_output_file_name(self) -> str:
"""
Get the output filename of the compiled model.
Returns:
Full path to the compiled model output file.
Raises:
RuntimeError: If the compilation failed.
"""
def get_output_constant_file_name(self) -> str:
"""
Get the output filename of the compiled model constant file, if any.
Returns:
Full path to the constant file of the compiled model, or empty string if none.
Raises:
RuntimeError: If the compilation failed.
"""
def get_model_tag(self) -> str:
"""
Get the model tag for the compiled model.
Returns:
Model tag string based on compilation flags.
Raises:
RuntimeError: If the compilation failed.
"""