Advancements in Embedding-Base... Note

Advancements in Embedding-Based Retrieval at Pinterest Homefeed

At Pinterest Homefeed, embedding-based retrieval is a key candidate generator that retrieves highly personalized and engaging content to fulfill various user intents. The team has introduced a two-tower model with advanced feature crossing and ID embeddings to improve model performance. Feature crossing is a key component of the model, and the team has experimented with different techniques such as MaskNet and DHEN to scale up the architecture. MaskNet is a feature-wise multiplication technique that simplifies the model architecture and brings high learnability with extensive feature crossing. DHEN is a framework that ensembles multiple different feature crossing layers in both serial and parallel ways, which brings further improvement to the model. The team has also adopted pre-trained ID embeddings by contrastive learning on sampled negatives over a cross-surface large window dataset. However, directly fine-tuning the embeddings can lead to overfitting, and the team has found that fixing the embedding table and applying an aggressive dropout can mitigate this issue. The team has also renovated the serving corpus by switching to a time-decayed summation to determine the score of a Pin and closing the gap between training data and serving corpus. Additionally, the team has explored state-of-the-art modeling techniques such as multi-embedding retrieval and conditional retrieval to further improve the performance of the embedding-based retrieval model. These techniques have led to significant improvements in user engagements and recommendation funnel efficiency.
CdXz5zHNQW_nRsW3jGDPy.png