Apparate: Early-Exit Models for ML Latency and Throughput Optimization - Preparing Models

The idea is that such ramps take advantage of all available data outputs from the original model’s processing to boost their predictions. Apparate accepts a model in the ONNX format, a widely used IR that represents the computation as a directed acyclic graph 6. Once ingested, Apparate must first identify candidate layers for ramp addition.