MOSEC

Model Serving made Efficient in the Cloud.

Get started

Apache-2.0 License. GitHub

Highly performant

Web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O.

Ease of use

User interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing.

Dynamic batching

Aggregate requests from different users for batched inference and distribute results back.

Pipelined stages

Spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads.

Do one thing well

Focus on the online serving part, users can pay attention to the model performance and business logic.

Cloud friendly

Deigned to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems.