GPU-Serving Two-Tower Models f... Note

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

Pinterest developed a new GPU-serving two-tower model for ads lightweight ranking. This model employs an MMOE-DCN architecture to balance performance and serving latency. The lightweight ranking stage efficiently narrows down ad candidates for downstream models. This new architecture replaced the previous MTMD model and included feature updates. They achieved a 5-10% reduction in offline loss for CTR prediction. Further segmentation of standard and shopping ads also improved loss reduction and model iteration speed. Training efficiency was improved through dataloader optimizations, model code adjustments, and training configuration tuning. Evaluation utilized KL divergence loss, and the model was evaluated on auction winners and candidates. Online experiments showed significant reductions in CPC and increases in CTR. The project yielded substantial gains in offline and online metrics. This advancement signifies progress in scaling recommender systems with more complex and efficient models. The project was a collaborative effort across multiple teams at Pinterest.
CdXz5zHNQW_MAv1itJAPE.png