Pinterest Engineering | Medium - TheNote.app

Pinterest Engineering | Medium
Follow

Pinterest Engineering, showcased on Medium, provides a behind-the-scenes look at the technological innovations driving the popular visual discovery platform. Through in-depth articles, engineers share insights into their work on scalability, machine learning, data infrastructure, and more. The publication highlights Pinterest's engineering culture, emphasizing collaboration, experimentation, and a passion for solving complex problems. Readers can explore topics like building recommendation systems, optimizing search functionality, and developing tools for data analysis. The content offers valuable perspectives for engineers and tech enthusiasts interested in the intricacies of a large-scale platform like Pinterest. Whether delving into the challenges of image recognition or the evolution of their infrastructure, Pinterest Engineering on Medium provides a fascinating glimpse into the technical side of a beloved online destination.

Stories by Pinterest Engineering on Medium medium.com

RSS Hunter • Aug 19, 2024

Thread Of Notes

Achieving Near-Linear Training Scalability for Pinterest’s Foundation Models

Pinterest's foundation models are crucial for their recommendation systems, impacting millions of users daily. Initially, multi-node training for these large models performed poorly, with adding more machines drastically slowing down the process. Even with AWS Elastic Fabric Adapter (EFA) for improved networking, scaling remained inefficient. Profiling revealed that distributed embedding lookups caused significant communication bottlenecks, with GPUs waiting on data. The team implemented several optimizations to address this communication overhead. Quantized Communications (QComms) reduced the data payload by compressing embedding tensors. Balanced sharding improved workload distribution across GPUs. Bandwidth-aware embedding optimization halved embedding dimensions to decrease data movement. A key breakthrough was implementing 2D Parallelism, initially optimizing for AllReduce, which improved local communication. Finally, they flipped the 2D Parallelism topology to optimize for All-to-All, keeping expensive operations within nodes and using cheaper AllReduce for cross-node synchronization. This led to near-linear scaling, achieving 2.0x at 2 nodes and 3.9x at 4 nodes, and an impressive 7.5x scaling at 8 nodes. These advancements enabled training larger models, resulting in significant user engagement gains on Pinterest's recommendation surfaces and faster experimentation cycles.

https://medium.com/pinterest-engineering/achieving-near-linear-training-scalability-for-pinterests-foundation-models-14d4f59fe6f6?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jun 25

Automated Schema Evolution in Pinterest’s Next-Generation DB Ingestion Framework

Pinterest has developed a robust, automated schema evolution framework for their Kafka-based CDC ingestion platform. Schema changes are a critical, cross-system contract, and unchecked evolution can lead to pipeline failures and data inconsistencies. Their solution focuses on making schema evolution safe, repeatable, and scalable by treating it as a multi-stage convergence process. The architecture involves CDC sources, Kafka, Flink for transformation, and Spark for upserts into Iceberg tables.A core component is a reliable onboarding model that uses schema definition files with stable numeric identifiers as the source of truth. Updates propagate automatically across Kafka, Flink, Spark, and Iceberg through a PR-based rollout with versioning and auditing. The system supports primarily additive schema changes to maintain backward compatibility and minimize complexity. Type changes are strictly limited to those preserving semantic meaning, like numeric precision widening.Schema evolution is managed through a three-phase convergence model to maintain pipeline availability. Phase one updates Iceberg schemas, phase two deploys updated Flink and Spark code, and phase three ensures data convergence. This phased approach decouples schema propagation from data correctness, allowing temporary divergence within a defined SLA. Pinterest employs an SLA-based model for schema evolution, prioritizing predictability and operational safety.Deployment strategies are carefully managed, especially for Flink, to prevent data loss. Unsupported or ambiguous cases, such as default values or primary key changes, have specific manual recovery paths. Ambiguous CREATE TABLE diffs are resolved by comparing against the database's actual DDL history rather than inferring intent from textual changes. Concurrent schema changes are handled sequentially to prevent race conditions, ensuring serialized convergence. Column transformations are managed by annotating schemas with required transformations, which are then injected into the ingestion pipeline. Error handling and recovery mechanisms, particularly for Spark failures, ensure that processing resumes from the last successful watermark.

https://medium.com/pinterest-engineering/automated-schema-evolution-in-pinterests-next-generation-db-ingestion-framework-36c5c07070de?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jun 24

Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use

The text describes the redesign of a user-sequence platform at Pinterest, aiming to provide a robust and efficient system for retrieving user behavior data for ML models. The core goal is to deliver consistent, fresh, complete, and cost-effective sequences across training, analysis, and serving. The platform defines user sequences as ordered lists of recent, enriched events. Key challenges addressed include ensuring data freshness, completeness, consistency, and scalability across different use cases and teams. The solution employs a "one definition, many runtimes" approach, using configuration-as-code and a shared execution engine to process events in real-time and batch. The platform implements a lambda architecture to manage both current and historical data. This design allows for easier onboarding of new event types and enrichments, improved code review, and reduced drift between real-time and batch processing. The three crucial design decisions are configuration-as-code for sequences and enrichments, a shared execution engine, and a lambda architecture for sequences. The result is a platform that simplifies the process of building, maintaining, and utilizing user sequences for various ML tasks within the company.

https://medium.com/pinterest-engineering/making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use-2a56a928cae1?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 21

An Engineer’s Guide to Better AI Skills: Implementing a Testing Process to Optimize Agent…

Engineers are experiencing unreliability when using AI agents, especially when they need to invoke custom skills. To resolve this, tests were conducted on agents using a specific iOS architecture skill. The goal was to quantify skill invocation reliability and identify optimization techniques. A core testing tool was built based on a Bash script; this orchestrated automated testing using prompts, capturing logs, and checking results. Positive and negative test cases were defined and used to evaluate the skill's ability to be invoked. Log parsing techniques were implemented to detect the skill's invocation based on JSON output patterns. Key performance metrics like success rate and accuracy were calculated to assess the agents' performance. Initial testing revealed that both agents had imperfect skill invocation rates, especially with ambiguous prompts. Several optimizations were discovered, including enhancing the skill description, using aggressive language, and adding a skills table. Combining multiple techniques provided improved results, particularly for the Codex agent. The conclusion highlighted the importance of testing and improving skill invocation processes. Developers must use high-quality, thorough prompts to maximize AI agent effectiveness.

https://medium.com/pinterest-engineering/an-engineers-guide-to-better-ai-skills-implementing-a-testing-process-to-optimize-agent-a000c9c9abcd?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 12

Enhancing Ad Relevance: Integrating Real-Time Context into Sequential Recommender Models

The authors developed a Contextual Sequential Two Tower Model to improve ad recommendations on Pinterest, especially for context-specific surfaces like Related Pins. The initial model lacked real-time context, hindering its effectiveness because it relied solely on historical user behavior. To solve this, they integrated a context layer into the model's architecture, allowing the model to incorporate information from the user's current activity. They used synthetic data during training, injecting pseudo-context derived from conversion events to teach the model. A hybrid serving flow was adopted, where most of the user tower processing is done offline, but the context layer is processed online. This allows for dynamic user embeddings influenced by real-time context, improving relevance. Offline evaluations showed a significant improvement in Recall@K compared to the previous production model. The new model increased candidate survival rates and improved ad relevance, especially on the Related Pins surface. This resulted in a measurable increase in conversion-related business metrics, particularly Return on Ad Spend (ROAS). Future work includes expanding the model to other surfaces like Search and experimenting with advanced fusion techniques, such as cross-attention. This work demonstrates the importance of incorporating real-time context for enhancing ad relevance and user experience.

https://medium.com/pinterest-engineering/enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models-bc3a2f9b682e?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 8

Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer

Pinterest's online ML serving system uses a root-leaf architecture where client services request scores for Pins. The root component handles feature retrieval and preprocessing, while leaves perform model inference, often on GPUs. This design simplifies onboarding new models and optimizes resource utilization by separating CPU and GPU workloads. However, it led to a network bottleneck between the root and leaf partitions due to passing many features.Initially, lz4 compression was implemented to reduce network usage, resulting in significant bandwidth savings but with a slight increase in CPU usage and latency. This was a good start, but the core issue of shipping unnecessary features persisted. The "Send What You Use" approach was then developed to address this by only sending features that a specific model requires.The model signature, which defines a model's inputs and outputs, serves as the source of truth for feature requirements. As models are trained and exported, their signatures are saved alongside them. Leaften load these signatures to build feature converters that process only the necessary features.To synchronize feature requirements between the root and leaves, model signatures are published as lightweight artifacts. These signatures are aggregated into bundle-level mappings, which are then deployed to the root alongside existing configurations. This deployment follows the same staged delivery process as model rollouts, ensuring consistency and enabling graceful rollbacks.This integration allows the Feature Trimmer to dynamically update feature allowlists on the root, ensuring that only essential features are transmitted. The system is designed to handle frequent model updates and gradual rollouts by using versioned lookups and fallback mechanisms. This ensures that the root's view of required features stays synchronized with the actual models deployed on the leaves. By trimming unneeded features, Pinterest significantly reduced network traffic and improved infrastructure efficiency.

https://medium.com/pinterest-engineering/optimizing-ml-workload-network-efficiency-part-i-feature-trimmer-ae20beb08d69?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 1

From Clicks to Conversions: Architecting Shopping Conversion Candidate Generation at Pinterest

Pinterest developed a dedicated candidate generation model for conversion ads to address challenges with offsite conversion data sparsity and noise. This model differs from previous engagement-based systems by focusing on lower-funnel conversions. The initial launch in 2023 yielded significant improvements in both conversion and engagement metrics, including a higher clickthrough rate. Further iterations in 2025 delivered even greater conversion value and enhanced advertiser return on ad spend. To combat data sparsity, the model is trained across all shopping surfaces using a multi-surface approach. It supplements primary conversion signals with onsite engagement data, re-weighting click data based on duration to mitigate noise. Harder negatives, such as ad impressions with no engagement, are used for more robust contrastive learning. The model incorporates user-side features capturing real-time intent and long-term preferences, alongside Pin-side features for semantic understanding and performance tracking. A two-tower architecture with DCN v2 and an MLP in parallel cross layers enhances feature interaction modeling and retrieval quality. The model evolved from a multi-head design to a unified multi-task architecture, allowing direct benefit from multi-task optimization during serving. An advertiser-level loss function was introduced to provide a more stable granularity for conversion signals, leading to substantial recall improvements. This new model successfully increased shopping conversion volume and improved advertiser performance while enhancing the user shopping experience.

https://medium.com/pinterest-engineering/from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation-at-pinterest-04cae5e1455b?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 27

Smarter URL Normalization at Scale: How MIQPS Powers Content Deduplication at Pinterest

Pinterest uses content understanding to drive distribution and engagement, requiring insight into images and outbound links. The core problem is URL normalization, where identical product pages appear under varied URLs due to tracking parameters. This redundancy leads to wasted computational resources through repeated fetching and processing. Item canonicalization aims to unify identical items represented by different URLs, crucial for shopping catalogs. When item IDs are absent, advanced URL normalization is vital for deduplication.The Minimal Important Query Param Set (MIQPS) algorithm automatically learns which URL parameters influence content identity. It distinguishes between neutral parameters, which don't affect page content, and non-neutral parameters, which do. While static rules work for well-known platforms, Pinterest's vast domain set requires a dynamic, data-driven approach.The MIQPS algorithm operates in three steps. First, it collects a corpus of observed URLs per domain from Pinterest's ingestion pipeline. Second, URLs are grouped by their query parameter pattern, ensuring parameters are analyzed in their specific context. This prevents misclassifying a parameter based on a different URL type.Finally, for each parameter within a pattern, the algorithm empirically tests its importance. It samples URLs with distinct parameter values and computes content IDs for both the original and modified (parameter-removed) URLs. If removing the parameter changes the content ID in a significant percentage of samples, it's classified as non-neutral and retained. Otherwise, it's deemed neutral and can be safely stripped for normalization. Each merchant domain receives its own MIQPS map, accounting for domain-specific parameter meanings.

https://medium.com/pinterest-engineering/smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication-at-pinterest-4aa42e807d7d?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 20

Finding zombies in our systems: A real-world story of CPU bottlenecks

Pinterest's ML platform team encountered crashing Ray-based training jobs due to intermittent network connectivity issues, prompting an investigation by the PinCompute team. The investigation, spanning over three months, revealed that the failures correlated with ENA network driver resets on AWS EC2 instances. These resets, caused by CPU starvation, were linked to high system CPU usage. Initially, the team tried various solutions like using huge pages and memory allocators, all of which failed to resolve the issue. Oddly the issues were happening in only one of Pinterest's AWS availability zones. Profiling efforts using perf and mpstat identified instances of single CPU core saturation. A temporal profiling setup using perf revealed the culprit as a process that was sporadically consuming high CPU resources. The process was identified to be the zombie process. The discovery of zombies and their impact on CPU utilization and network driver performance led to a deeper understanding of the system bottlenecks.

https://medium.com/pinterest-engineering/finding-zombies-in-our-systems-a-real-world-story-of-cpu-bottlenecks-ea4722e552eb?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 15

Scaling Recommendation Systems with Request-Level Deduplication

Pinterest leverages request-level deduplication to optimize its recommendation models and manage infrastructure costs. This technique avoids redundant processing of request-level data, which includes massive user action sequences. Deduplication significantly reduces storage needs, with storage compression reaching 10-50x on user-heavy feature columns using Apache Iceberg. While implementing request-sorted data, they addressed issues and maintained model quality through SyncBatchNorm and user-level masking. This led to significant training speedups, with a 4x improvement for retrieval models and 2.8x for ranking models. This also improved serving throughput, enabling a 7x increase in ranking serving capacity using the Deduplicated Cross-Attention Transformer (DCAT) architecture. This comprehensive approach, yielded impactful improvements across storage, training, and serving. Ultimately, request-level deduplication is a cross-cutting technique with simple but effective solutions.

https://medium.com/pinterest-engineering/scaling-recommendation-systems-with-request-level-deduplication-93bd514142d9?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 13

Performance for Everyone

Performance is crucial for mobile apps, akin to a default feature like time on a watch. Pinterest focuses heavily on measuring, protecting, and improving performance across key user experiences. User-perceived latency, reflecting the time users wait for content, is a vital performance metric. Measuring this latency, particularly Visually Complete time, was previously complex and time-consuming. Because Visually Complete varies greatly, customized measurement logic was needed, hindering performance work. Pinterest's performance team sought an easy solution for product engineers to access latency data. This led to integrating Visually Complete logic into the base UI class, automatically measuring latency for any UI surface. The system works by traversing the view tree, identifying and tracking key media views for rendering progress. This unified system provides latency data on over 60 Android surfaces, aiding performance monitoring. It enables fair performance comparisons across different features, including those with short lifespans. This simplified approach makes performance visible to all engineers, fostering optimization of user-perceived latency. Similar implementations have also been extended to iOS and web platforms.

https://medium.com/pinterest-engineering/performance-for-everyone-21a560260d08?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 8

Evolution of Multi-Objective Optimization at Pinterest Home feed

Pinterest's feed recommendation uses a cascaded system for item selection and presentation. The final stage focuses on multi-objective optimization, balancing engagement, new use case adoption, and business goals. The team improved this stage through algorithmic and infrastructure upgrades over time. They initially used a Determinantal Point Process (DPP) algorithm for feed diversification, showing significant user engagement improvements. They later implemented Sliding Spectrum Decomposition (SSD), offering lower computational complexity and flexibility. SSD enabled incorporating quality goals, leading to a "soft spacing" penalty for content requiring extra caution. This framework avoids restrictive filtering, creating a better user experience. The system infrastructure evolved, moving logic to a model server for easier experimentation. Diversity signals have also improved, incorporating visual, text, and graph embeddings for better pin similarity computations. They introduced content quality signals and upgraded visual embeddings for real-time improvements. Semantic IDs were then added to manage semantic overlap for more effective diversity control.

https://medium.com/pinterest-engineering/evolution-of-multi-objective-optimization-at-pinterest-home-feed-06657e33cd10?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 7

Zero-Downtime PyTorch Upgrade in Production: Approaches, Pitfalls and Lessons

Pinterest upgraded its machine learning stack from PyTorch 2.1 to 2.6 to leverage new features and improve performance. This upgrade involved addressing challenges like outdated dependencies, breaking API changes, and TorchScript compatibility. They updated the Ubuntu DLAMI and CUDA versions to meet PyTorch 2.6 requirements. They encountered and resolved TorchScript initialization issues by disabling JIT profiling and disabling the fuser for TorchScript. They mitigated breaking API changes by introducing a compile-time macro to bridge versions. A time-windowed multi-stage rollout was adopted to minimize downtime and control production impact. Following the upgrade, they fixed DCGM metric loss issues by addressing a resource conflict. The update also involved resolving intermittent model deployment failures. These updates involved a transition to a new DLAMI, resolving conflicts, and adapting to changes. The ultimate goal was to ensure a smooth and reliable production transition.

https://medium.com/@Pinterest_Engineering/zero-downtime-pytorch-upgrade-in-production-approaches-pitfalls-and-lessons-db3f456dc794?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Mar 30

Building an MCP Ecosystem at Pinterest

Pinterest developed a Model Context Protocol (MCP) ecosystem to enable AI agents. MCP allows large language models to interact with tools and data using a unified protocol. The architecture consists of multiple domain-specific, cloud-hosted MCP servers. A central registry manages these servers, providing discovery and authorization. Engineers can write tools and the platform handles deployments. The platform integrated MCP into existing workflows, like the internal AI chat. Security is paramount, with a dedicated standard and two-layer authorization using JWTs and mesh identities. Business-group-based access gating mitigates risks for sensitive actions. Human-in-the-loop approvals are required for sensitive operations for safety. The system is designed to be observable, logging inputs and outputs for impact analysis. The MCP ecosystem is saving engineers significant amounts of time, with over 66,000 monthly invocations. Pinterest plans to expand MCP usage by adding servers and refining governance.

https://medium.com/pinterest-engineering/building-an-mcp-ecosystem-at-pinterest-d881eb4c16f1?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Mar 19

Unified Context-Intent Embeddings for Scalable Text-to-SQL

Pinterest developed an Analytics Agent to improve its Text-to-SQL capabilities for its vast data warehouse. They faced challenges due to the scale and complexity of their data, with numerous tables and diverse analytical needs. The agent leverages unified context-intent embeddings to capture the meaning behind queries, ensuring semantic understanding. Simultaneously, it extracts structural, statistical patterns and incorporates governance metadata to rank results. The data warehouse initially needed cleanup and standardization, which led to a table governance program with tiered classifications. Analytical knowledge is encoded from query history, moving beyond simple keyword matching. SQL queries are translated into natural language descriptions, capturing the original analytical intent through a three-step process. Generalizable descriptions and analytical questions create a reusable knowledge base. This natural-language description is then embedded into a vector representation for intent-based retrieval. Structural and statistical patterns are also extracted, including join and aggregation patterns. These patterns combine with governance metadata to inform a governance-aware ranking system. The agent utilizes these two dimensions to provide the necessary information for generating and validating answers to analytics question.

https://medium.com/pinterest-engineering/unified-context-intent-embeddings-for-scalable-text-to-sql-793635e60aac?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Mar 6

Unifying Ads Engagement Modeling Across Pinterest Surfaces

Pinterest consolidated its ad engagement models from three separate, surface-specific models into a single unified architecture. This change aimed to address inefficiencies like slow iteration, redundant training costs, and maintenance burdens. The project followed a strategy prioritizing simplicity and safe iteration, starting with merging the strongest components. The initial baseline unified model saw offline improvements but increased costs, leading to further refinement. The refined architecture incorporated elements from different surfaces, like MMoE and long user sequences, achieving better results with a more reasonable cost. Surface-specific calibration was implemented to handle traffic distribution differences across surfaces effectively. Multi-task learning and surface-specific exports were introduced for flexibility and surface-specific iterations. Efficiency optimizations, including projection layers and request-level broadcasting, reduced infrastructure costs and latency. The unified model demonstrated significant improvements in both offline and online metrics. This consolidation enables faster and more consistent improvements. Finally, the next step involves unifying the Related Pins surface, with a focus on model efficiency.

https://medium.com/pinterest-engineering/unifying-ads-engagement-modeling-across-pinterest-surfaces-4b5cd3d99e67?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Mar 3

Bridging the Gap: Diagnosing Online–Offline Discrepancy in Pinterest’s L1 Conversion Models

Pinterest's L1 ranking stage, crucial for ad performance, faced a persistent online-offline (O/O) discrepancy when deploying new conversion rate (CVR) models. Despite strong offline gains in loss and calibration, online A/B tests showed neutral or negative results, leading to launch delays. The investigation involved a full-stack diagnosis addressing model and evaluation, serving and features, and the funnel's impact. Initial checks ruled out offline evaluation issues, exposure bias, and serving failures as primary causes. Feature O/O discrepancy, where serving lacked features used in training, and embedding version skew, resulting in query and pin tower misalignment, were identified as key problems. The solution involved feature onboarding and addressing embedding skew, improving feature coverage and aligning model versions. Further analysis revealed the importance of funnel alignment and metric matching, where improvements in L1 metrics might not translate to CPA gains due to existing funnel limitations. This highlighted the need to consider O/O discrepancy as a core design constraint for model deployment.

https://medium.com/pinterest-engineering/bridging-the-gap-diagnosing-online-offline-discrepancy-in-pinterests-l1-conversion-models-1320faaaeefe?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Feb 27

Piqama: Pinterest Quota Management Ecosystem

Pinterest developed Piqama, a generic quota management platform, to oversee resource usage across various systems. Piqama manages the entire quota lifecycle, including schema management, validation, and update authorization through a centralized portal. The platform provides default enforcement and punishment strategies while allowing for customization via application-specific logic. Governance and optimization features include collecting usage statistics and enabling auto-rightsizing for efficient resource allocation. Budgets and quotas are interconnected, with chargeback systems influencing quota settings and resource allocation. Piqama is implemented in two areas: capacity-based quota management for the Big Data Processing Platform and rate-limiting quotas for online storage. In the Big Data Platform, Piqama manages memory, vcore, and concurrent applications, with automatic and manual adjustments to quotas. The Big Data Platform uses Yunikorn and a resource database for accurate quota calculation and enforcement. A new rate limiting framework for online storage is introduced to enhance system resource allocation and cost control. The rate-limiting framework aims to streamline lifecycle management, connect the rate limit to resource usage and utilize Piqama as its control plane. This approach offers a robust, flexible, and centralized solution for managing resources across Pinterest's diverse platforms to avoid any manual adjustments.

https://medium.com/pinterest-engineering/piqama-pinterest-quota-management-ecosystem-dc7881433bf5?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Feb 24

Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest

Pinterest, using Apache Spark, tackled frequent out-of-memory errors (OOMs) in its large-scale data processing environment. They introduced "Auto Memory Retries" to automatically retry tasks failing with OOM on executors with increased memory. The primary goal was to reduce on-call load and save costs associated with failing applications. The core idea involved assigning tasks with higher memory needs a specific resource profile. This custom Apache Spark version modifies the scheduling loop to retry tasks with larger memory profiles using a hybrid approach. This approach can increase the CPU per task, or launch physically larger executors if necessary. The implementation involved extending core Spark classes like Task and TaskSetManager and updating the SparkUI. They developed a comprehensive dashboard to monitor the impact, measuring cost savings and job recovery rates. The rollout was staged, starting with ad-hoc jobs and then gradually incorporating scheduled jobs in tiers. The results successfully reduced OOM errors and optimized resource utilization within the Spark cluster.

https://medium.com/pinterest-engineering/drastically-reducing-out-of-memory-errors-in-apache-spark-at-pinterest-c55d7dac2257?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Feb 17

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

Pinterest developed a new GPU-serving two-tower model for ads lightweight ranking. This model employs an MMOE-DCN architecture to balance performance and serving latency. The lightweight ranking stage efficiently narrows down ad candidates for downstream models. This new architecture replaced the previous MTMD model and included feature updates. They achieved a 5-10% reduction in offline loss for CTR prediction. Further segmentation of standard and shopping ads also improved loss reduction and model iteration speed. Training efficiency was improved through dataloader optimizations, model code adjustments, and training configuration tuning. Evaluation utilized KL divergence loss, and the model was evaluated on auction winners and candidates. Online experiments showed significant reductions in CPC and increases in CTR. The project yielded substantial gains in offline and online metrics. This advancement signifies progress in scaling recommender systems with more complex and efficient models. The project was a collaborative effort across multiple teams at Pinterest.

https://medium.com/pinterest-engineering/gpu-serving-two-tower-models-for-lightweight-ads-engagement-prediction-5a0ffb442f3b?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Feb 13

Next Generation DB Ingestion at Pinterest

Pinterest created a new database ingestion framework to replace its slow, batch-oriented legacy system. The new framework uses Change Data Capture (CDC), Kafka, Flink, and Spark to ingest data in near real-time. This design offers lower latency and better efficiency than older methods. The system ingests changes from databases like MySQL and TiDB into CDC tables. Flink streams process these CDC events and store them in Iceberg tables. Spark jobs then periodically merge changes from CDC tables into base tables using "Merge Into" statements. Key optimizations include partitioning base tables and using bucket joins for efficiency. These techniques reduce compute costs and improve the speed of the upsert operations. The team standardized on the Merge-on-Read (MOR) approach for its advantages. The framework supports row-level deletions and provides native data compliance. Future work will focus on automated schema evolution within the framework.

https://medium.com/pinterest-engineering/next-generation-db-ingestion-at-pinterest-66844b7153b7?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Feb 5

Beyond Two Towers: Re-architecting the Serving Stack for Next-Gen Ads Lightweight Ranking Models…

The authors aimed to upgrade their ad serving system beyond the Two-Tower model to leverage more complex neural networks requiring a GPU-based inference stage. The primary challenge was integrating this new stage without increasing latency in their highly optimized serving funnel. They addressed the feature fetching bottleneck by bundling high-value candidate features directly within the model and employing a high-performance key-value store for others. Business logic, such as filtering and sorting, was moved into the model for efficiency, minimizing data transfer. Significant latency reduction was achieved through GPU optimizations, including multi-stream CUDA and kernel fusion. The authors also re-architected the retrieval data flow, returning essential metadata first and fetching the rest later. Further latency improvements came by introducing parallel paths for feature expansion. Finally, an unexpected shift in metrics emerged due to the switch from local to global ranking, requiring careful analysis and tuning to maintain performance. This transition represents a significant re-architecture effort to increase recommendation quality.

https://medium.com/pinterest-engineering/beyond-two-towers-re-architecting-the-serving-stack-for-next-gen-ads-lightweight-ranking-models-1992f2b76cbb?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Feb 2

Ads Candidate Generation using Behavioral Sequence Modeling

Pinterest's ads aim to inspire users and seamlessly integrate into their shopping journeys. Understanding rapidly evolving user behavior is key to surfacing relevant ads. Traditional targeting methods often miss the nuances of user intent. The Pinterest Ads team developed advanced behavioral sequence modeling for improved ad candidate generation. Initially, a transformer-based model predicted advertisers users would interact with next. This advertiser-level model achieved significant lifts in conversion volume and reduced CPA in production. Building on this, the team developed an item-level model to predict specific products users would engage with. This model uses rich Pin embeddings and catalog metadata for granular representations. The item-level model also demonstrated substantial improvements in user checkout performance and reduced CPA. Learnings involved addressing popularity bias, handling sparse features, and optimizing sequence length.

https://medium.com/pinterest-engineering/ads-candidate-generation-using-behavioral-sequence-modeling-f9077ee1325d?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jan 28

PinLanding: Turn Billions of Products into Instant Shopping Collections with Multimodal AI

Large online platforms face the challenge of organizing billions of items into navigable shopping collections. Historically, these collections relied on user search history and manual curation. However, multimodal large language models (LLMs) now enable generating collections directly from content, while still considering user search patterns. This paper introduces Pinlanding, a production pipeline for shopping collection generation. Pinlanding comprises four components: understanding user search intent, building a shopping collection vocabulary using LLMs, constructing feeds from attributes, and evaluating/evolving the system. User interaction data helps characterize shopping intents, revealing both high-volume searches and emerging long-tail conversational queries. A vision-language model generates initial product attributes, which are then curated into a compact vocabulary using statistical filtering, embedding-based clustering, and LLM-assisted review. A CLIP-style dual-encoder model is trained for scalable attribute assignment, efficiently mapping products to attributes. Ray is used for scalable batch inference in attribute assignment, and Spark constructs feeds by scoring product-topic relevance. The CLIP-based classifier shows superior performance on a fashion attribute prediction benchmark. Human evaluation demonstrates that Pinlanding significantly improves precision in collection quality compared to traditional methods. The system has led to a four-fold increase in unique shopping topics and a 35% improvement in search performance. Future work involves integrating social trends and developing an AI-agent layer to handle emergent composite concepts.

https://medium.com/pinterest-engineering/pinlanding-turn-billions-of-products-into-instant-shopping-collections-with-multimodal-ai-3489320294e9?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jan 13

LLM-Powered Relevance Assessment for Pinterest Search

Pinterest Search developed a method to enhance search relevance evaluation using Large Language Models (LLMs). Traditional relevance measurement relied on costly human annotations, limiting the scale and sensitivity of A/B experiments. To address this, they fine-tuned open-source LLMs on human-labeled data to predict Pin relevance to queries. This LLM-based approach treats relevance prediction as a multiclass classification problem, utilizing features like Pin titles, descriptions, and image captions.They adopted a stratified query sampling design, which significantly reduces the Minimum Detectable Effect (MDE) by an order of magnitude. This new methodology enables the measurement of heterogeneous treatment effects and improves evaluation efficiency. The LLM labeling process significantly lowers costs and time, allowing for larger and more representative sample sizes.After fine-tuning, the LLM-based relevance model generates relevance scores, which are then used to compute metrics like sDCG@K. Rigorous validation showed high alignment between LLM-generated labels and human annotations, with an exact match rate of 73.7% and strong rank-based correlations. This alignment holds even for queries of different popularity segments.The LLM-based relevance assessment proved effective for non-English queries as well, maintaining strong correlations and low bias. By transitioning to LLM-based relevance assessment, Pinterest Search has been able to scale up their evaluation query sets and improve the quality of relevance metrics for online experiment evaluation. This has led to a significant reduction in manual annotation efforts and enhanced the overall efficiency of their A/B testing process. The chosen LLM, XLM-RoBERTa-large, offers a good balance of prediction quality and inference efficiency.

https://medium.com/pinterest-engineering/llm-powered-relevance-assessment-for-pinterest-search-b846489e358d?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Dec 10, 2025

How Pinterest Built a Real‑Time Radar for Violative Content using AI

Pinterest uses a metric called prevalence to measure policy-violating content, defined as the percentage of all views that went to harmful content. Prevalence complements user reports by identifying under-reported harms and tracking trends. Historically, reliance on human review for measuring prevalence was slow and expensive. To address this, Pinterest developed an AI-assisted workflow for daily prevalence measurement. This involves sampling user impressions and using a multimodal LLM for large-scale labeling. The LLM, guided by expert prompts and subject matter experts, significantly reduces latency and cost while maintaining accuracy. Prevalence is calculated daily, with confidence intervals, and can be broken down by policy areas, sub-policies, and content surfaces. The system uses risk scores from enforcement models for efficient sampling, but these scores do not act as labels. Inverse-probability weighting ensures the prevalence statistic accurately reflects user impressions over time, even with enforcement threshold changes. Machine learning is crucial for unbiased sampling and efficient labeling, allowing for faster risk detection and proactive responses. This data-driven approach enables quicker product iterations, informed policy development, and strategic decision-making, including setting goals and allocating resources effectively. Challenges like wide confidence intervals for rare categories or policy drift are managed through adaptive sampling and continuous monitoring. Future plans include expanding pivoting capabilities, optimizing LLM usage, and refining human-in-the-loop processes for enhanced accuracy and reduced bias.

https://medium.com/pinterest-engineering/how-pinterest-built-a-real-time-radar-for-violative-content-using-ai-d5a108e02ac2?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Dec 8, 2025

Improving Quality of Recommended Content through Pinner Surveys

In 2023, Pinterest committed to the Inspired Internet Pledge, focusing on user wellbeing and a safer online experience. Pinterest is using Pinner surveys to understand user perception of content quality and improve platform content. These surveys directly ask users to rate images, providing valuable feedback for improving recommendations. The goal is to promote high-quality content that inspires users and drives long-term engagement. Pinterest avoids promoting low-quality "clickbait" by training recommendation systems with survey data. A machine-learning model was developed to predict visual quality based on Pinner ratings, using image embeddings and a pairwise ranking approach. The model was trained on a dataset of 5,000 Pins, with multiple ratings per image to reduce noise. Offline evaluations demonstrated the model's ability to distinguish between higher and lower quality content. This model helped Pinterest de-bias the system, ensuring that rewarded engagement comes from high-quality content, aligning with their "Pinners First" value. The model has been successfully implemented across Pinterest's major surfaces, including Homefeed, Related Pins, and Search.

https://medium.com/pinterest-engineering/improving-quality-of-recommended-content-through-pinner-surveys-eebca8a52652?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Dec 5, 2025

On the (re)-prioritization of open-source AI

Pinterest is shifting its AI investments towards fine-tuned open-source models to achieve similar performance at a lower cost. Open-source models are improving, especially in cost efficiency against their performance. Pinterest finds that compact, fit-for-purpose models outperform general-purpose LLMs on specific tasks. This approach allows them to leverage domain-specific data and product integration for differentiation. User modeling, visual, and text foundation models are built, bought, or adapted based on modalities. The shift to open-source is driven by the leveling of model capabilities and the emphasis on fine-tuning. Pinterest’s assistant leverages Pinterest-native tools and an LLM for query understanding and tool calling. Advantages include reduced costs, better personalization, and the ability to align models with brand values. Pinterest will continue using a mix of internally developed, fine-tuned open-source, and third-party AI models. The long-term strategy involves leveraging data to build efficient models and partnering to address capability gaps.

https://medium.com/pinterest-engineering/on-the-re-prioritization-of-open-source-ai-86f7279481e3?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Dec 4, 2025

Autonomous Observability at Pinterest (Part 1 of 2)

Pinterest faced a fragmented observability system where logs, traces, and metrics were siloed. This hindered a holistic understanding of platform issues, forcing engineers to navigate multiple interfaces. The team adopted a "shift-left" and "shift-right" approach to improve instrumentation and production monitoring. To overcome data fragmentation, they embraced AI and context engineering, specifically using the Model Context Protocol (MCP). An MCP server was developed to unify disparate observability signals like metrics, logs, traces, and change events. This solution allows AI agents to access and correlate data without a complete infrastructure overhaul. The MCP server provides unified access to various data pillars, offering fine-grained context control and plug-and-play extensibility. It acts as a hub for agentic observability experiences, empowering teams to build context-aware tools. Challenges arose from model context size limitations due to the massive volume of data processed. Solutions included generating direct links to relevant dashboards or providing more specific tool documentation to AI agents.

https://medium.com/pinterest-engineering/autonomous-observability-at-pinterest-part-1-of-2-eb0adae830ba?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Dec 3, 2025

Slashing CI Wait Times: How Pinterest Cut Android Testing Build Times by 36%+

Android end-to-end testing builds at Pinterest were slow and unreliable due to unbalanced test shards and platform limitations. The team first evaluated third-party solutions but found them inadequate for their needs. They decided to build an in-house testing platform called PinTestLab, hosted on EC2 emulators. This platform allowed for complete control over the testing stack and infrastructure.The core innovation is a runtime-aware sharding mechanism. This system uses historical test duration and stability data to pack tests into shards. The goal is to ensure that each shard has a similar total runtime. This approach differs from simply balancing the number of tests per shard.Previously, package-based sharding led to imbalances where a single slow shard would delay the entire build. Even simple time-based sorting failed to account for emulator idle time. The new runtime-aware sharding algorithm works by sorting tests by average runtime and then greedily assigning each test to the emulator projected to finish earliest. This keeps all emulators busy and minimizes the time difference between the fastest and slowest shards.The impact of this solution has been significant. End-to-end build times were reduced by nine minutes, a 36% improvement. The runtime of the slowest shard decreased by 55%. The time difference between the fastest and slowest shards was dramatically compressed from 597 seconds to just 130 seconds. This boosts developer velocity by providing faster and more reliable feedback.

https://medium.com/pinterest-engineering/slashing-ci-wait-times-how-pinterest-cut-android-testing-build-times-by-36-feb6ff121d91?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Nov 10, 2025

A Decade of AI Platform at Pinterest

Pinterest's AI journey evolved from fragmented machine learning stacks to a unified AI Platform. Early ML efforts involved individual teams building custom solutions, leading to redundancy and training-serving skew. Linchpin DSL and Scorpion inference service were early attempts at unification, but faced limitations with evolving technologies. A small ML Platform team struggled to drive adoption without organizational alignment and incentives. EzFlow aimed to improve training orchestration, but adoption was slow due to product teams' focus on immediate metrics. Seed bets like PySpark, Training Compute Platform, and Galaxy laid the foundation for future advancements. DNNs emerged in recommendation systems, with teams like Home Feed building solutions like AutoML, which exposed brittle foundations. Adoption was driven by organizational alignment, product goals, and industry momentum. Efficiency became a limiter, demanding deeper collaboration between modeling and platform teams as transformer models and GPUs reshaped infrastructure.

https://medium.com/pinterest-engineering/a-decade-of-ai-platform-at-pinterest-4e3b37c0f758?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Nov 4, 2025

Identify User Journeys at Pinterest

Pinterest aims to become an inspiration-to-realization platform, understanding users' long-term goals. They introduce user journeys, defined by interests, intent, and context, to achieve this. These journeys are inferred by analyzing user interactions, moving beyond simple content recommendations. The system, built with a "lean" approach, clusters keywords from user data to identify journeys. Dynamic keyword extraction and hierarchical clustering are used to generate flexible and personalized journeys. Journey naming, expansion, ranking, and diversification are then applied to enhance user experience. A stage prediction model determines the journey's lifecycle for appropriate notifications. The output is a list of distinct user journeys with names, keywords, stage, and confidence scores. LLMs are used to evaluate journey relevance and guide system improvements. Experiments with journey-aware notifications showed significantly improved user engagement. Furthermore, Pinterest is actively leveraging LLMs to simplify and improve journey inference overall. The company is actively fine-tuning LLMs and implementing scalable batch inference for efficient execution.

https://medium.com/pinterest-engineering/identify-user-journeys-at-pinterest-b517f6275b42?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Oct 21, 2025

Tracking Down Mysterious ML Training Stalls

Pinterest's ML training platform, MLEnv, encountered a significant performance drop after a PyTorch version upgrade. This issue led to a more than 50% reduction in training throughput. The debugging process began by examining the GPU roofline throughput. This measurement revealed a 20% performance decrease even when excluding the data loader. Further analysis focused on individual model modules to pinpoint the source of the slowdown. A specific transformer module, module A, was identified as the primary culprit. The PyTorch profiler showed that CompiledFunctions, previously present, were now missing for this module in the upgraded version.Investigation into torch.compile revealed a log indicating that a non-infrastructure PyTorch dispatch mode was present, which torch.compile did not support. Minimal reproducible scripts confirmed that this issue manifested specifically within the trainer class. The problematic component was identified as a context manager used for FLOPs counting, enabled by default. Disabling this context manager resolved the torch.compile issue, restoring CompiledFunctions. However, this fix did not improve end-to-end throughput.The focus shifted back to the data loading and distributed training aspects, ruling out Ray.data as the cause by observing the same GPU roofline throughput issues even when running as a native PyTorch application. Several observations pointed to intermittent slow iterations, a straggler effect during synchronization, and a peculiar behavior where enabling Nvidia's Nsight Systems profiler eliminated the slowness. Testing on a single GPU confirmed distributed training was not the root cause. Disabling torch.compile entirely in the Ray setup restored original throughput, suggesting that graph breaks within torch.compile were related to the slowdowns.Creating a minimal reproducible model with extensive graph breaks led to the observation of recurring slow iterations. Nsight Systems traces revealed that the main training thread was holding the Global Interpreter Lock (GIL) during these slow iterations, but this did not explain the entire pause. Further analysis using the Linux perf tool and visualizing the traces with chrome://tracing highlighted a suspicious Python process. This process was executing an expensive computation, specifically a Linux kernel call named smap_gather_stats, which gathers virtual memory statistics.

https://medium.com/@Pinterest_Engineering/tracking-down-mysterious-ml-training-stalls-5290bb19be6d?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Oct 17, 2025

Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 2 of 2)

Pinterest is developing Moka, a next-generation data processing platform, to replace its aging Hadoop-based system. This platform is deployed on AWS Elastic Kubernetes Service (EKS) across four environments: test, dev, staging, and production. Terraform, augmented by custom AWS modules and Helm charts, manages the EKS cluster deployments. A critical component of Moka is its logging infrastructure, which utilizes Fluent Bit to collect and export logs from EKS control planes, Spark applications, and system pods to Amazon S3. Fluent Bit is configured to group Spark application logs by a unique job ID and to parse YuniKorn logs for resource usage summaries. For observability, Pinterest employs a Prometheus-compatible framework to gather metrics. They developed a custom sidecar, kubemetricsexporter, to bridge their existing TSDB-based Statsboard system with Prometheus metrics. The OpenTelemetry Collector is used to receive, process, and export telemetry data, with a specific pipeline configured for Prometheus metrics. This robust infrastructure aims to ensure efficient and reliable data processing at massive scale for Pinterest.

https://medium.com/pinterest-engineering/next-gen-data-processing-at-massive-scale-at-pinterest-with-moka-part-2-of-2-d0210ded34e0?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Sep 10, 2025

Developer Experience at Pinterest: The Journey to PinConsole

Pinterest faced engineering velocity challenges due to increasing technological complexity as its user base grew. The company recognized that its decentralized tool adoption strategy created bottlenecks and an overwhelming landscape for new engineers. To address this, Pinterest decided to reimagine its developer experience by building an Internal Developer Platform called PinConsole. PinConsole is a unified developer portal built on the open-source Backstage platform. This platform approach aims to create a consistent abstraction layer, allowing engineers to focus on business logic rather than infrastructure. After evaluating various solutions, Pinterest chose Backstage for its strong community adoption, extensible plugin architecture, and active development. PinConsole integrates with Pinterest’s internal authentication systems and LDAP for a unified entity model. The architecture utilizes PostgreSQL databases for data storage and applies Pinterest's Gestalt design system for UI consistency. A key component is the PinCompute plugin, a custom Kubernetes integration that simplifies managing workloads using Pinterest-specific abstractions. Personalized homepage widgets, like the GitHub integration, further enhance the developer experience by reducing context switching and providing relevant information.

https://medium.com/pinterest-engineering/developer-experience-at-pinterest-the-journey-to-pinconsole-b34ac9e3bdd9?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Aug 22, 2025

Debugging the One-in-a-Million Failure: Migrating Pinterest’s Search Infrastructure to Kubernetes

Pinterest's search infrastructure, Manas, was migrated to Kubernetes, but a performance issue was discovered where one in every million search requests took 100 times longer than usual. The issue was investigated, and it was found that a monitoring process, cAdvisor, was causing the problem. cAdvisor was scanning the entire page table every 30 seconds to calculate the total bytes of memory referenced by a process, which was causing contention with the memory-intensive leaf processing in Manas. This was causing the latency spikes in the search requests. The investigation involved profiling search systems, debugging performance issues, Linux kernel features, and memory management. The root cause was identified as cAdvisor's working set size (WSS) estimation, which was enabled by default and was causing the memory contention. The issue was resolved by disabling cAdvisor's WSS estimation for all PinCompute nodes. This fix was a major milestone for Pinterest's Kubernetes platform, allowing other online services to be moved to the platform. The investigation highlighted the importance of resource isolation, narrowing the problem space, and using blackbox debugging strategies. The experience also showed that sometimes, a good enough solution is sufficient, and it's not necessary to find an exact solution to move forward.

https://medium.com/pinterest-engineering/debugging-the-one-in-a-million-failure-migrating-pinterests-search-infrastructure-to-kubernetes-bef9af9dabf4?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jul 16, 2025

Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 1 of 2)

Pinterest's Data Engineering team is building a new massive scale data processing platform to replace their current Hadoop-based platform, Monarch. The team explored Kubernetes-based systems as a replacement due to their growing popularity and increasing adoption in the Big Data community. The new platform had to meet certain criteria, including extensive support for containers, execution of Pinterest's custom Spark fork, and lower operational and maintenance costs. The team performed a comprehensive evaluation of running Spark on various platforms and leaned towards Kubernetes-focused frameworks due to their advantages, including container-based isolation and security, ease of deployment, and built-in frameworks. Kubernetes provides more fine-grained support for container management and deployment than other systems, but lacks built-in support for data management, storage, and processing. The team's current deployment model in Hadoop is cumbersome, and they are moving towards a more straightforward approach using Terraform, container images, and Helm. The new platform will leverage Kubernetes and EKS to replace Monarch, introducing several challenges, including integrating EKS into the existing Pinterest environment and finding replacements for Hadoop components. The team has built a new platform, Moka, which is able to process batch Spark workloads that only access non-sensitive data, and will add more functionality in the future. The initial high-level design of Moka includes a system that can process batch Spark workloads, with jobs submitted and processed through a series of components, including Spinner, Archer, and the Spark Operator. The team will provide more details on the core application-focused aspects of their platform in the next part of their blog series.

https://medium.com/pinterest-engineering/next-gen-data-processing-at-massive-scale-at-pinterest-with-moka-part-1-of-2-39a36d5e82c4?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jul 11, 2025

Scaling Pinterest ML Infrastructure with Ray: From Training to End-to-End ML Pipelines

At Pinterest, ML engineers face challenges in optimizing feature development, sampling strategies, and label experimentation due to slow data pipelines, costly feature iterations, and inefficient compute usage. To address these challenges, Pinterest expanded Ray's capabilities beyond training to feature development, sampling, and label modeling. The traditional ML infrastructure was constrained by slow data pipelines, costly feature iterations, and inefficient compute usage. Pinterest introduced a Ray-native ML infrastructure stack, focusing on four major improvements: building a Ray Data native pipeline API, efficient data joining with Iceberg Bucket Joins, data persistence for efficient iteration, and Ray Data optimizations for large workloads. The new Ray-powered ML workflow reduces ML iteration times by 10X while significantly cutting infrastructure costs. The Ray Data native pipeline API enables feature development, sampling, and label transformations natively in Ray, eliminating the need for Spark backfills. Iceberg Bucket Joins enable fast and efficient feature joins across different sources without precomputing large tables. Data persistence allows for efficient iteration by caching transformed features and reusing them when applicable. The Ray Data optimizations achieved a 2-3X speedup across different pipelines, and the new workflow has unlocked a more scalable, efficient, and cost-effective ML infrastructure at Pinterest.

https://medium.com/pinterest-engineering/scaling-pinterest-ml-infrastructure-with-ray-from-training-to-end-to-end-ml-pipelines-4038b9e837a0?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jun 24, 2025

Unlocking Efficient Ad Retrieval: Offline Approximate Nearest Neighbors in Pinterest Ads

Pinterest uses online approximate nearest neighbors (ANN) for ad retrieval, but offline ANN is also valuable for large-scale data processing, and cost-effective operations. Offline ANN precomputes candidates offline, ideal for scenarios with high throughput and low-latency query responses and relatively static query context. Pinterest has successfully applied online ANN, but faces challenges with expanding ads inventory. Migrating from Hierarchical Navigable Small World (HNSW) to Inverted File (IVF) algorithm enables a larger tier index, but increases costs. Offline ANN benefits from ample computational resources and latency tolerance, effective for candidate generators with static query contexts. The primary difference between online and offline approaches is the timing of the ANN search. Offline ANN has pros, including cost efficiency and extensibility, but cons, including real-time limitations and fixed neighbors. Pinterest has evaluated offline ANN-based retrieval in several use-cases, including similar item ads and visual embedding. Offline ANN has shown better engagement and conversion performance, and Pinterest is actively developing its own offline ANN framework and platform for future advancements.

https://medium.com/pinterest-engineering/unlocking-efficient-ad-retrieval-offline-approximate-nearest-neighbors-in-pinterest-ads-6fccc131ac14?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jun 12, 2025

Next-Level Personalization: How 16k+ Lifelong User Actions Supercharge Pinterest’s Recommendations

Pinterest's home feed is crucial for user engagement and discovery, and it uses a two-stage process to rank pins based on user interests and personalized pin relevance. The Pinnability model uses a neural network to consume various pin, context, and user signals, but it has limitations in modeling lifelong user behavior. The TransActV2 model addresses these challenges by leveraging long user sequences, integrating a Next Action Loss function, and employing scalable deployment solutions. TransActV2 can model up to 16,000 user actions, integrates explicit action features, and stores actions losslessly using int8 quantization. The model uses a multi-headed, point-wise multi-task network over a wide and deep stack, and introduces a Next Action Loss function to enhance user action forecasting. The NAL function challenges the model to predict not just engagement probability but also what the user will do next. The model achieves significant improvements in offline and online metrics, including a 13.31% increase in top-3 repin hit and a 6.35% increase in repin. The model's industrial-scale engineering enables efficient serving and deployment, achieving 75-81% lower p99 model run latency and 103-338x end-to-end inference latency reduction. The real-world impact of TransActV2 is massive, with millions more meaningful engagements and significant improvements in user experience.

https://medium.com/pinterest-engineering/next-level-personalization-how-16k-lifelong-user-actions-supercharge-pinterests-recommendations-bd5989f8f5d3?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jun 6, 2025

Automated Migration and Scaling of Hadoop™ Clusters

Pinterest's big data infrastructure uses Hadoop YARN on AWS with Auto Scaling Groups (ASGs) to process large amounts of data. The company uses Terraform to create and manage clusters, but scaling in (downsizing) is a complex process that requires manual steps. To simplify this process, Pinterest introduced the Hadoop Control Center (HCC), which allows for automatic scaling in and out of clusters. Before HCC, scaling in involved a tedious and error-prone process of selecting nodes to decommission, adding them to exclude files, and then terminating them. HCC streamlines this process by allowing users to specify the desired ASG size, and the tool handles the rest. HCC also integrates other useful tools for cluster management, including displaying node status, reporting on YARN applications, and showing subnet and security group details. HCC's architecture consists of a manager node and worker nodes, with the manager acting as an intermediary and cache. The Hadoop Operations Server (HOS) is the core of HCC, which does the heavy lifting of updating JMX cache, maintaining fabric connections, and updating excludes files. HCC periodically queries and consolidates JMX data to make decisions about what to do, and it manages the process of decommissioning nodes.

https://medium.com/pinterest-engineering/automated-migration-and-scaling-of-hadoop-clusters-69c0967228e4?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jun 5, 2025

Adopting Docs-as-Code at Pinterest

Pinterest's internal developer surveys revealed that technical documentation is a top pain point, with issues boiling down to quality and discoverability. Traditional solutions, such as doc-a-thons and passionate appeals from senior leaders, have not produced lasting improvements. In 2021, Pinterest decided to try a new approach, exploring different strategies to enhance documentation tools and processes, with a focus on the "docs-as-code" strategy. This initiative, called PDocs, aimed to elevate the quality of technical documentation and transform the culture of documentation at Pinterest. The "docs-as-code" philosophy involves writing documentation using the same processes as code, including using markup languages, source control, code review tools, and static site generators. By adopting this strategy, Pinterest aimed to solve documentation problems, encouraging good documentation practices, quality control, and discoverability. PDocs, a custom-built static site generator, was developed to automatically colocate documentation projects from various file paths and repositories, generating a single centralized doc site. PDocs allows for a developer experience where engineers can drop a simple config and Markdown file in any repository, and have it show up in the centralized doc site once merged. The PDocs UI was designed to be project-centric, with features like favoriting, recently viewed, and a "published" or "draft" setting to maintain reader trust.

https://medium.com/pinterest-engineering/adopting-docs-as-code-at-pinterest-4f18ad169c25?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Jun 3, 2025

Healthier Personalization with Surveys

Pinterest is a unique platform where users, known as Pinners, come to find inspiration and ideas for various aspects of their lives. The platform's goal is to provide a personalized experience, showing users content that is relevant to their interests and searches. Pinterest's approach to personalization is different from other platforms, as it prioritizes quality time over time spent on the platform. The company believes that a balance between different approaches to content ranking is necessary, incorporating explicit engagement signals, community guidelines, and survey-based personalization. Pinterest uses surveys to gather feedback from users and create a healthier and more inspirational experience. The platform's surveys are designed to be rigorous and effective, with a team of experts ensuring that the surveys are well-designed and useful. The surveys have been instrumental in helping Pinterest create a positive and inspirational experience for users, with recent research showing that the platform leads the industry in terms of its impact on user wellbeing. Pinterest's approach to personalization is guided by the principles of the Inspired Internet Pledge, which calls for companies to prioritize user wellbeing and create a healthier internet experience. By using surveys and prioritizing user wellbeing, Pinterest is proving that it is possible to create a safer and healthier online experience. Overall, Pinterest's unique approach to personalization and its commitment to user wellbeing set it apart from other social media platforms.

https://medium.com/pinterest-engineering/healthier-personalization-with-surveys-65177cf9bea8?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 30, 2025

Modernizing Home Feed Pre-Ranking Stage

Pinterest's home feed recommendation system has adopted a multi-stage design, and the team has achieved a significant milestone with a sophisticated pre-ranking layer that improved business metrics. The initial design had limitations, including deployment efforts, model auto-retraining challenges, and a two-tower architecture that couldn't learn item interactions effectively. The team has made foundational improvements to modernize the pre-ranking layer, including a new system design, logging pipeline, and serving architecture. The new design includes a request-level sub-component and an item-level sub-component that are jointly trained and decoupled during serving. The team has also implemented an early funnel logging pipeline to distinguish pre-ranking from ranking and to bring unbiased data into training. The serving architecture design includes a root-leaf architecture to mitigate CPU and memory overhead. The team has also adopted model distillation to better align the pre-ranking model with the L2 ranker. Online experiments have shown significant engagement wins, and the team has also worked on setting up an auto-retraining framework to leverage fresh engagement data. The team is continuing to work on modeling innovations, data sampling, model architecture improvement, loss exploration, and serving optimization.

https://medium.com/pinterest-engineering/modernizing-home-feed-pre-ranking-stage-e636c9cdc36b?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 29, 2025

How Pinterest Accelerates ML Feature Iterations via Effective Backfill

At Pinterest, the mission is to inspire users to curate a life they love, which relies on state-of-the-art Recommendation and Ads models trained on tens of petabytes of data. These models drive personalized recommendations, showing users content that resonates with their interests. Experimenting with features is a common task, and the first step is integrating new features into the training dataset. The most straightforward method of incorporating features is through Forward Logging, but this method presents challenges such as high calendar day cost, high development time cost, lack of isolation, and resource wastage and instability. Feature Backfill is an alternative to forward logging that is commonly used to address these challenges. In this blog post, the authors explore how they've created their Feature Backfill Solution, leveraging various techniques to reduce costs and iteration time by up to 90x. The authors developed an initial backfill solution using Spark to materialize features within their training tables, which operates as a reusable Airflow DAG that is triggered by ML Engineers on Demand. However, this solution has challenges such as no concurrent backfills, high compute cost, and manual partition management. To address these challenges, the authors developed a v2 version, adopting a two-stage backfill approach, which streamlines the process into two key stages: Feature Staging and Feature Promotion.

https://medium.com/pinterest-engineering/how-pinterest-accelerates-ml-feature-iterations-via-effective-backfill-d67ea125519c?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 19, 2025

500X Scalability of Experiment Metric Computing with Unified Dynamic Framework

Pinterest's experimentation platform, Helium, runs daily experiments generating insights for product decisions and business strategies. However, as the scale of experimentation grew, challenges arose, including delays in upstream data ingestion, difficulties in backfilling skipped metrics, and frequent scalability issues. To address these challenges, Pinterest developed the Unified Dynamic Framework (UDF), a scalable and resilient solution that has transformed how experiment metrics are computed. UDF supports 100X more metrics and is designed to scale to 500X in the future, accelerating metric delivery and reducing engineering effort from months to days. The framework achieves standardization of metric processing, offloading infrastructure challenges and pipeline creation complexities. UDF addresses upstream dependencies, backfill complexity, and scalability issues, enabling faster experimentation and innovation. The framework has improved developer velocity, flexibility, scalability, speed, and reliability, driving innovation and business outcomes. The standardization of metric computing across the experimentation platform has led to immense improvements, empowering experimentation and delivering value to users. The UDF has revolutionized experiment metric computing at Pinterest, and its impact will continue to grow in the future.

https://medium.com/pinterest-engineering/500x-scalability-of-experiment-metric-computing-with-unified-dynamic-framework-9eb356fee676?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • May 13, 2025

Multi-gate-Mixture-of-Experts (MMoE) model architecture and knowledge distillation in Ads…

The Multi-gate Mixture-of-Experts (MMoE) model architecture improves ad engagement modeling by dynamically allocating resources to specialized sub-networks (experts). This improves efficiency, generalization, and multi-task learning compared to single models. MMoE leverages experts with diverse architectures like DCNv2, MaskNet, and FinalMLP, strategically chosen based on performance and cost. The model also utilizes mixed precision inference and lightweight gate layers to reduce infrastructure costs without sacrificing performance. Knowledge distillation further enhances the model by transferring knowledge from existing production models to new models. This mitigates performance gaps caused by limited data retention periods and allows new models to learn from unavailable historical data. Distillation improves both offline and online metrics significantly, surpassing the baseline DCNv2 model. The technique is beneficial during both batch training and model retraining scenarios, such as feature upgrades. However, distillation is removed during incremental training to prevent overfitting. The combined approach of MMoE and knowledge distillation leads to substantial improvements in ad matching quality and user experience. This results in more relevant recommendations and improved user engagement on the platform.

https://medium.com/pinterest-engineering/multi-gate-mixture-of-experts-mmoe-model-architecture-and-knowledge-distillation-in-ads-08ec7f4aa857?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 24, 2025

Migrating 3.7 Million Lines of Flow Code to TypeScript

Pinterest migrated 3.7 million lines of code from Flow to TypeScript in eight months, resulting in better type safety, developer experience, and improved hiring. The company initially chose Flow over TypeScript in 2016 due to its gradual adoption and seamless integration with React. However, as the industry settled on TypeScript as the standard for JavaScript type checking, Pinterest decided to adopt it for its better community support, language features, and talent availability. The migration was done using a "big bang" approach, dividing the process into three phases: setup, conversion, and integration. The setup phase involved configuring TypeScript and @typescript-eslint, while the conversion phase involved migrating dependencies, running codemods, and suppressing ESLint errors. The integration phase focused on adapting existing systems to function within the new TypeScript environment. The company wrote a script to automate the entire process, minimizing merge conflicts and manual intervention. After validating the migration through daily automated testing, multiple rounds of manual testing, and byte-for-byte static analysis, Pinterest successfully rolled out the TypeScript branch. The company learned a lot from the open-source community and contributed to Stripe's flow-to-typescript codemod. Pinterest's experience serves as a valuable lesson for other companies considering a similar migration.

https://medium.com/pinterest-engineering/migrating-3-7-million-lines-of-flow-code-to-typescript-8a836c88fea5?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 16, 2025

Handling Network Throttling with AWS EC2 at Pinterest

Pinterest, a visual search engine, runs on AWS and uses Amazon EC2 instances for its compute fleet. The company identified a significant challenge in managing its EC2 infrastructure, particularly for its online storage systems, due to a lack of clear insights into EC2's network performance and its impact on application reliability and performance. To address this, Pinterest developed network performance monitoring for its EC2 fleet and implemented techniques to manage network bursts, ensuring dependable network performance for critical online serving workloads. The company experienced issues with user sequence serving, which drove significant user engagement wins but resulted in serving latency and application timeouts. During an EC2 instance migration, Pinterest saw significant performance degradation across many clusters, leading to application timeouts. The company discovered that EC2 instances were experiencing network throttling due to microbursts that exceeded the network allowance. To make EC2 network throttling behavior more transparent, Pinterest upgraded its instances to access raw counters on an EC2 instance using tools like ethtool. The company modified its internal metrics collection agent to scrape these counters and ingest them into its metrics storage. By rolling out these ENA metrics to its entire EC2 fleet, Pinterest gained unprecedented visibility into AWS traffic shaping and implemented various optimizations to mitigate network throttling. The company also explored techniques to handle network bursts, including fine-grained S3 rate limiting, data backup tuning, and network compression.

https://medium.com/@Pinterest_Engineering/handling-network-throttling-with-aws-ec2-at-pinterest-fda0efc21083?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 7, 2025

Improving Pinterest Search Relevance Using Large Language Models

Pinterest Search is a key surface where users can discover inspiring content that aligns with their information needs, and search relevance measures how well the search results align with the search query. To improve the search relevance model, a 5-level guideline is used to measure the relevance between queries and Pins. A cross-encoder language model is used to predict a Pin's relevance to a query, along with Pin text, and the task is formulated as a multiclass classification problem. The model is fine-tuned using human-annotated data, minimizing cross-entropy loss.To represent each Pin, a varied set of text features is used, including Pin titles and descriptions, synthetic image captions, high-engagement query tokens, user-curated board titles, and link titles and descriptions. However, the cross-encoder LLM-based classifier is hard to scale for Pinterest Search due to real-time latency and cost considerations. Therefore, knowledge distillation is used to distill the LLM-based teacher model into a lightweight student relevance model.The student model uses query-level features, Pin-level features, and query-Pin interaction features to predict 5-scale relevance scores. Knowledge distillation and semi-supervised learning are employed to train the student model, which makes effective use of vast amounts of initially unlabeled data and expands the data to a wide range of languages from around the world.Offline experiments demonstrate the effectiveness of each modeling decision, including the comparison of language models, the importance of enriching text features, and scaling up training labels through distillation. Online results show a +2.18% improvement in search feed relevance, as measured by nDCG@20, and a significant uptick in search fulfillment rates globally.The proposed relevance modeling pipeline effectively generalizes across languages not encountered during training, and the multilingual LLM-based relevance teacher model generalizes across unseen languages. Future work will explore the integration of servable LLMs, vision-and-language multimodal models, and active learning strategies to dynamically scale and improve the quality of the training data.

https://medium.com/pinterest-engineering/improving-pinterest-search-relevance-using-large-language-models-4cd938d4e892?source=rss-ef81ef829bcb------2 medium.com

RSS Hunter • Apr 4, 2025