Autonomous Observability at Pinterest (Part 1 of 2)
Pinterest faced a fragmented observability system where logs, traces, and metrics were siloed. This hindered a holistic understanding of platform issues, forcing engineers to navigate multiple interfaces. The team adopted a "shift-left" and "shift-right" approach to improve instrumentation and production monitoring. To overcome data fragmentation, they embraced AI and context engineering, specifically using the Model Context Protocol (MCP). An MCP server was developed to unify disparate observability signals like metrics, logs, traces, and change events. This solution allows AI agents to access and correlate data without a complete infrastructure overhaul. The MCP server provides unified access to various data pillars, offering fine-grained context control and plug-and-play extensibility. It acts as a hub for agentic observability experiences, empowering teams to build context-aware tools. Challenges arose from model context size limitations due to the massive volume of data processed. Solutions included generating direct links to relevant dashboards or providing more specific tool documentation to AI agents.