Introduction

ort is an open-source Rust binding for ONNX Runtime.

⚠️

These docs are for the latest alpha version of ort, 2.0.0-rc.2. This version is production-ready (just not API stable) and we recommend new & existing projects use it.

ort makes it easy to deploy your machine learning models to production via ONNX Runtime (opens in a new tab), a hardware-accelerated inference engine. With ort + ONNX Runtime, you can run almost any ML model (including ResNet, YOLOv8, BERT, LLaMA) on almost any hardware, often far faster than PyTorch, and with the added bonus of Rust's efficiency.

ONNX (opens in a new tab) is an interoperable neural network specification. Your ML framework of choice -- PyTorch, TensorFlow, Keras, PaddlePaddle, etc. -- turns your model into an ONNX graph comprised of basic operations like MatMul or Add. This graph can then be converted into a model in another framework, or inferenced directly with ONNX Runtime.

An example visual representation of an ONNX graph, showing how an input tensor flows through layers of convolution nodes.

Converting a neural network to a graph representation like ONNX opens the door to more optimizations and broader acceleration hardware support. ONNX Runtime can significantly improve the inference speed/latency of most models and enable acceleration with NVIDIA CUDA & TensorRT, Intel OpenVINO, Qualcomm QNN, Huawei CANN, and much more.

ort is the Rust gateway to ONNX Runtime, allowing you to infer your ONNX models via an easy-to-use and ergonomic API. Many commercial, open-source, & research projects use ort in some pretty serious production scenarios to boost inference performance:

Getting started

Add ort to your Cargo.toml

If you have a supported platform (and you probably do), installing ort couldn't be any simpler! Just add it to your Cargo dependencies:

[dependencies]
ort = "2.0.0-rc.2"

Convert your model

Your model will need to be converted to an ONNX graph before you can use it.

Load your model

Once you've got a model, load it via ort by creating a Session:

use ort::{GraphOptimizationLevel, Session};
 
let model = Session::builder()?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(4)?
    .commit_from_file("yolov8m.onnx")?;

Perform inference

Preprocess your inputs, then run() the session to perform inference.

let outputs = model.run(ort::inputs!["image" => image]?)?;
let predictions = outputs["output0"].try_extract_tensor::<f32>()?;
...
There are some more useful examples in the ort repo (opens in a new tab)!

Next steps

Unlock more performance with EPs

Use execution providers to enable hardware acceleration in your app and unlock the full power of your GPU or NPU.

Show off your project!

We'd love to see what you've made with ort! Show off your project in GitHub Discussions (opens in a new tab) or on our Discord (opens in a new tab).