ort is an open-source Rust binding for ONNX Runtime.

ort makes it easy to deploy your machine learning models to production via ONNX Runtime, a hardware-accelerated inference engine. With ort + ONNX Runtime, you can run almost any ML model (including ResNet, YOLOv8, BERT, LLaMA) on almost any hardware, often far faster than PyTorch, and with the added bonus of Rust’s efficiency.

These docs are for the latest alpha version of ort, 2.0.0-rc.0. This version is production-ready (just not API stable) and we recommend new & existing projects use it.

Why ort?

There are a few other ONNX Runtime crates out there, so why use ort?

For one, ort simply supports more features:

Feature comparisonπŸ“• ortπŸ“— orsπŸͺŸ onnxruntime-rs
Upstream versionv1.17.0v1.12.0v1.8
dlopen()?βœ…βœ…βŒ
Execution providers?βœ…βŒβŒ
I/O Binding?βœ…βŒβŒ
String tensors?βœ…βŒβš οΈ input only
Multiple output types?βœ…βœ…βŒ
Multiple input types?βœ…βœ…βŒ
In-memory session?βœ…βœ…βœ…
WebAssembly?βœ…βŒβŒ
Provides static binaries?βœ…βŒβŒ
Sequence & map types?βœ…βŒβŒ

Users of ort appreciate its ease of use and ergonomic API. ort is also battle tested in some pretty serious production scenarios.

  • Twitter uses ort in part of their recommendations system, serving hundreds of millions of requests a day.
  • Bloop’s semantic code search feature is powered by ort.
  • SurrealDB uses ort in their surrealml package.
  • Numerical Elixir uses ort to create ONNX Runtime bindings for the Elixir language.
  • rust-bert implements many ready-to-use NLP pipelines in Rust Γ  la Hugging Face Transformers with both tch & ort backends.
  • edge-transformers also implements Hugging Face Transformers pipelines in Rust using ort.

Getting started

1

Add ort to your Cargo.toml

If you have a supported platform (and you probably do), installing ort couldn’t be any simpler! Just add it to your Cargo dependencies:

[dependencies]
ort = "2.0.0-alpha.4"
2

Convert your model

Your model will need to be converted to the ONNX format before you can use it.

  • The awesome folks at Hugging Face have a guide to export πŸ€— Transformers models to ONNX with πŸ€— Optimum.
  • For other PyTorch models: torch.onnx
  • For scikit-learn: sklearn-onnx
  • For TensorFlow, Keras, TFlite, TensorFlow.js: tf2onnx
  • For PaddlePaddle: Paddle2ONNX
3

Load your model

Once you’ve got a model, load it via ort by creating a Session:

use ort::{GraphOptimizationLevel, Session};

let model = Session::builder()?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(4)?
    .with_model_from_file("yolov8m.onnx")?;
4

Perform inference

Preprocess your inputs, then run() the session to perform inference.

let outputs = model.run(ort::inputs!["image" => image]?)?;

// Postprocessing
let output = outputs["output0"]
    .extract_tensor::<f32>()?
    .t()
    .slice(s![.., .., 0])
    .into_owned();
...
There are some more useful examples in the ort repo!

Next steps

1

Unlock more performance with EPs

Use execution providers to enable hardware acceleration in your app and unlock the full power of your GPU or NPU.

2

Show off your project!

We’d love to see what you’ve made with ort! Show off your project in GitHub Discussions or on our Discord.