RSS HackerNoon

Apparate: Early-Exit Models for ML Latency and Throughput Optimization - Design

Apparate is an end-to-end system that automatically integrates early exits into models and manages their operation throughout the inference process. Its overarching goal is to optimize per-request latencies while adhering to tight accuracy constraints and throughput goals. In addition, Apparate introduces two parameters: aggression and Os’ which runs atop existing serving platforms.
hackernoon.com
hackernoon.com