ONNX Runtime

Informally, the ONNX runtime consists of a computation graph and a heap-like memory where tensors reside. The runtime evaluates the graph in a dataflow fashion by executing each node once its inputs are ready, updating memory with outputs. The memory and the graph together form the runtime machine.

Definition 1.1 (ONNX Runtime State). The machine state of an ONNX runtime consists of a pair: $(G, M)$. $G$ is a directed acyclic computation graph whose nodes represent operator invocations (e.g., Add, MatMul, Relu) and whose edges represent tensor data dependencies. $M$ is a linear, read-write, byte-addressable memory array of size N bytes $(M: [0..N) → u8)$, used to store all tensor data (inputs, intermediate results, and outputs). Each tensor occupies a contiguous region within $M$, and tensor metadata (such as shape and type) is tracked separately in a tensor table.

Definition 1.2 (ONNX node format). Any ONNX node (operator) can be written in the following format: $[op\_type, inputs, outputs, attributes]$, where:

$op\_type$: a string identifying the operator (e.g., "$Add$", "$Relu$")
$inputs$: a list of input tensor names (each mapped to memory)
$outputs$: a list of output tensor names
$attributes$: constant parameters specific to the operator (e.g., axis in $Softmax$)

Definition 1.3 (ONNX Step Transition). Given:

A machine state $(G, M)$, where $G$ is the current computation graph and $M$ is linear memory
A topologically sorted node list $[n₀, n₁, ..., n_k]$
A tensor table $T$ mapping tensor names to $(offset, dtype, shape)$ in memory

We define the step transition for ONNX as:

Select node $nᵢ = [op_type, inputs, outputs, attributes, subgraph]$ from the graph (in topological order).
Read input tensors $x₀, ..., x_k$ from memory $M$ using tensor metadata in $T$. Each tensor is loaded via its memory offset and interpreted by $dtype$.
Apply operator function $f_{op}$ defined by op_type, using $inputs$ and $attributes$: $[y₀, ..., y_m] ← f_{op}(x₀, ..., x_k, attributes)$
Write outputs $y₀, ..., y_m$ to memory $M$, assigning memory locations using $T$. Update memory values at corresponding offsets.
Advance to next node in G. Repeat until all nodes are executed.

PreviousONNX NextQuantization

Last updated 1 month ago