External Data

Loading an ONNX Model with External Data

  • [Default] If the external data is under the same directory of the model, simply use onnx.load()

import onnx

onnx_model = onnx.load("path/to/the/model.onnx")
  • If the external data is under another directory, use load_external_data_for_model() to specify the directory path and load after using onnx.load()

import onnx
from onnx.external_data_helper import load_external_data_for_model

onnx_model = onnx.load("path/to/the/model.onnx", load_external_data=False)
load_external_data_for_model(onnx_model, "data/directory/path/")
# Then the onnx_model has loaded the external data from the specific directory

Converting an ONNX Model to External Data

import onnx
from onnx.external_data_helper import convert_model_to_external_data

onnx_model = ... # Your model in memory as ModelProto
convert_model_to_external_data(onnx_model, all_tensors_to_one_file=True, location="filename", size_threshold=1024, convert_attribute=False)
# Must be followed by save_model to save the converted model to a specific path
onnx.save_model(onnx_model, "path/to/save/the/model.onnx")
# Then the onnx_model has converted raw data as external data and saved to specific directory

Converting and Saving an ONNX Model to External Data

import onnx

onnx_model = ... # Your model in memory as ModelProto
onnx.save_model(onnx_model, "path/to/save/the/model.onnx", save_as_external_data=True, all_tensors_to_one_file=True, location="filename", size_threshold=1024, convert_attribute=False)
# Then the onnx_model has converted raw data as external data and saved to specific directory

onnx.checker for Models with External Data

Models with External Data (<2GB)

Current checker supports checking models with external data. Specify either loaded onnx model or model path to the checker.

Large models >2GB

However, for those models larger than 2GB, please use the model path for onnx.checker and the external data needs to be under the same directory.

import onnx

# onnx.checker.check_model(loaded_onnx_model) will fail if given >2GB model

TensorProto: data_location and external_data fields

There are two fields related to the external data in TensorProto message type.

data_location field

data_location field stores the location of data for this tensor. Value MUST be one of:

  • MESSAGE - data stored in type-specific fields inside the protobuf message.

  • RAW - data stored in raw_data field.

  • EXTERNAL - data stored in an external location as described by external_data field.

  • value not set - legacy value. Assume data is stored in raw_data (if set) otherwise in message.

external_data field

external_data field stores key-value pairs of strings describing data location

Recognized keys are:

  • "location" (required) - file path relative to the filesystem directory where the ONNX protobuf model was stored. Up-directory path components such as … are disallowed and should be stripped when parsing.

  • "offset" (optional) - position of byte at which stored data begins. Integer stored as string. Offset values SHOULD be multiples 4096 (page size) to enable mmap support.

  • "length" (optional) - number of bytes containing data. Integer stored as string.

  • "checksum" (optional) - SHA1 digest of file specified in under ‘location’ key.

After an ONNX file is loaded, all external_data fields may be updated with an additional key ("basepath"), which stores the path to the directory from which he ONNX model file was loaded.

External data files

Data stored in external data files will be in the same binary bytes string format as is used by the raw_data field in current ONNX implementations.

Reference https://github.com/onnx/onnx/pull/678