Netflix TechBlog | Medium

Introducing Impressions at Netflix

At Netflix, images on the platform are called "impressions" and play a crucial role in personalizing the user experience. Capturing and processing these impressions is a complex task that requires a sophisticated system. The system tracks and processes billions of impressions daily, maintaining a detailed history of each profile's exposure. This impression history is essential for enhanced personalization, frequency capping, highlighting new releases, and analytical insights. The first step in managing impressions is creating a Source-of-Truth (SOT) dataset, which supports various downstream workflows and enables multiple use cases. Raw impression events are collected from the client side and processed through a custom event extractor, Apache Kafka, and Apache Iceberg. The data is then filtered, enriched, and structured using Apache Flink, establishing a definitive source of truth for Netflix's impression data. The system ensures high-quality impressions by gathering detailed metrics and alerting the team of any potential issues. The architecture is designed to handle a massive volume of impression events in real-time, with a focus on scalability, flexibility, and high availability. Future work includes addressing unschematized events, automating performance tuning, and improving data quality alerts.
favicon
netflixtechblog.com
netflixtechblog.com