Feature Caching for Recommende... Note

Feature Caching for Recommender Systems w/ Cachelib

At Pinterest, we operate a large-scale online machine learning inference system where feature caching plays a critical role in achieving optimal efficiency. We decided to adopt the Cachelib project by Meta Open Source and expanded its capabilities to build a high-throughput, flexible feature cache. Our system heavily relies on a caching system to deliver ML features effectively, and the placement of the cache within our system is crucial. We evolved our system architecture as our ML inference platform transitioned from CPU to GPU serving. We have experimented with three different cache architectures: Sharded DRAM Cache, Single Node Hybrid DRAM + NVM Cache, and Separate Cache and Inference Nodes. We also implemented a pipeline to warm up the cold cache on new nodes before they start serving traffic, which consists of logging feature requests, uploading the logged requests to S3, and replaying the requests on new nodes.