Making Ads Count: Using MMoE a... Note

Making Ads Count: Using MMoE and Auxiliary Tasks to Better Connect Buyers & Sellers

Etsy improved its Ads Search ranking model to enhance buyer engagement and seller visibility. The goal was to better predict purchase intent by surfacing more relevant ad listings. This was achieved through two major enhancements: integrating the Multigate Mixture of Experts (MMoE) architecture and utilizing add-to-cart as an auxiliary signal.The original multitask model optimized for click-through rate (CTR) and post-click conversion rate (PCCVR), but suffered from data sparsity in later stages of the purchase journey. MMoE addresses the "seesaw phenomenon" in multitask learning, where optimizing one task can degrade another. It introduces specialized "experts" and "gates" that allow tasks to learn unique patterns while still benefiting from shared representations.The MMoE architecture includes a shared bottom, and then experts, which are parallel subnetworks that learn different data patterns. Each task has a gating network that controls how it combines expert outputs, optimizing for both CTR and PCCVR.Tuning the MMoE involved experimenting with the number, size, and type of experts. Heterogeneous experts (DCN- and MLP-based) showed improved metrics. Challenges included ensuring expert utilization and specialization.Regularization techniques like expert dropout and temperature scaling were explored to address these issues. Temperature scaling, which softens the probability distribution of expert selection, proved more effective in promoting both utilization and specialization.Beyond clicks and purchases, Etsy recognized the value of other user interactions like add-to-cart and favorites. These actions indicate high purchase intent and are more plentiful than purchases, offering stronger signals for the model.Introducing auxiliary tasks, specifically add-to-cart, helps the model learn more generalizable representations of user engagement. This leverages more frequent signals to benefit the sparser purchase prediction, ultimately leading to a more effective ranking system.
CdXz5zHNQW_Ply7jT2xQK.jpeg