Pinterest Tiered Storage for A... Note

Pinterest Tiered Storage for Apache Kafka®️: A Broker-Decoupled Approach

Apache Kafka has become a ubiquitous PubSub solution, handling petabytes of data at Pinterest. To address growing storage demands, Tiered Storage has emerged as a design pattern that offloads data from expensive broker disks to cheaper remote storage.Native Tiered Storage in Kafka 3.6.0+ tightly couples the feature with the broker process, limiting flexibility.Pinterest's broker-decoupled Tiered Storage implementation separates storage from compute, providing advantages such as cost reduction, resource optimization, and easier adoption.The decoupled approach employs a Segment Uploader that uploads finalized log segments to remote storage, a Tiered Storage Consumer for data consumption, and a remote storage system with lower per-unit storage costs.The Segment Uploader monitors broker file systems for finalized segments, detects leadership changes through ZooKeeper (or KRaft in newer Kafka versions), and handles fault tolerance to ensure data continuity.The Tiered Storage Consumer reads data from both local broker disk and remote storage, reducing the cost of serving.This decoupled implementation has offloaded ~200 TB of data from broker disks to cheaper object storage daily since May 2024.It provides flexibility in Tiered Storage adoption and feature updates, without affecting broker performance.The open-source implementation of Pinterest's broker-decoupled Tiered Storage for Apache Kafka is now available.