MOSEC
Model Serving made Efficient in the Cloud.
Get startedModel Serving made Efficient in the Cloud.
Get startedWeb layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O.
User interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing.
Aggregate requests from different users for batched inference and distribute results back.
Spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads.
Focus on the online serving part, users can pay attention to the model performance and business logic.
Deigned to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems.