Context engineering case studies: Etsy-specific question answering
This post explores prompt engineering with Large Language Models (LLMs) for AI-assisted onboarding at Etsy. The primary focus is on the truthfulness and reliability of LLM-generated answers, particularly concerning Etsy-specific questions. The study examined two use cases: internal Travel & Entertainment (T&E) policy questions and external Etsy seller community forum questions. For the T&E policy, LLMs answered approximately 86% of questions correctly, but the remaining 14% contained factual errors or misleading statements, termed "hallucinations." Techniques like instructing the LLM to admit uncertainty or to explain its reasoning were found to mitigate these hallucinations. In the Etsy community forums, with more heterogeneous data, LLM accuracy dropped to around 72%. The LLM performed better when queries closely matched the wording in reference documents. The study also highlighted limitations where even providing additional context did not resolve certain types of complex questions. Asking for source snippets was identified as a method to flag potential LLM hallucinations. Overall, prompt engineering shows promise but requires careful crafting to ensure reliable AI assistance in onboarding and information retrieval.