Scaling Recommendation Systems with Request-Level Deduplication

Pinterest leverages request-level deduplication to optimize its recommendation models and manage infrastructure costs. This technique avoids redundant processing of request-level data, which includes massive user action sequences. Deduplication significantly reduces storage needs, with storage compression reaching 10-50x on user-heavy feature columns using Apache Iceberg. While implementing request-sorted data, they addressed issues and maintained model quality through SyncBatchNorm and user-level masking. This led to significant training speedups, with a 4x improvement for retrieval models and 2.8x for ranking models. This also improved serving throughput, enabling a 7x increase in ranking serving capacity using the Deduplicated Cross-Attention Transformer (DCAT) architecture. This comprehensive approach, yielded impactful improvements across storage, training, and serving. Ultimately, request-level deduplication is a cross-cutting technique with simple but effective solutions.

https://medium.com/pinterest-engineering/scaling-recommendation-systems-with-request-level-deduplication-93bd514142d9?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 13