Improving Text Embeddings with Large Language Models: Analysis of Training Hyperparameters

The authors of the paper are Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei, all from Microsoft Corporation. The paper is available on arxiv under the CC0 1.0 DEED license. The paper is structured into sections including abstract and introduction, related work, method, experiments, analysis, conclusion, and references. The method section includes synthetic data generation and training. The experiments section presents the results of the model fine-tuning and evaluation, as well as multilingual retrieval. The analysis section discusses the necessity of contrastive pre-training and the impact of training hyperparameters. The results show that the Mistral-7B initialization performs better than LLaMA-2 7B. The choice of pooling types and LoRA ranks does not significantly affect the performance, but the way of adding instructions has a considerable impact. The authors conclude that natural language instructions enable the model to generate more discriminative embeddings. The framework provides a way to customize the behavior of text embeddings through instructions without fine-tuning the model or re-building the document index.

hackernoon.com

RSS Hunter

2024-10-09

Create attached notes ...