DEV Community
Follow
Agentic RAG with OpenSearch Serverless: Anatomy of a Pattern
The author critiques the new agentic AI-focused OpenSearch Serverless, highlighting potential pitfalls like cold starts, exploding costs, and the misconception that serverless removes architectural need. Classical RAG struggles with agency where LLMs iteratively call tools and reformulate queries across diverse data. Financial agents require rapid, low-latency vector searches across multiple indices. OpenSearch Serverless OCUs scale per collection, and idle collection cold starts are a significant latency issue. The agentic RAG pattern involves ingestion, embedding, and an iterative retrieval-generation cycle with orchestrators, tools, and memory. Key configurations include HNSW indexing for low latency and hierarchical chunking with metadata filtering for retrieval quality. Reranking retrieved documents with a cross-encoder dramatically improves precision. Sizing considerations include OCU memory limits, P99 vector search latency, and cold start times. This pattern is suitable for variable traffic, non-uniform knowledge growth, and multi-tenancy. It's inappropriate for SLOs below 500ms end-to-end agent response. Anti-patterns include multi-tenant single indexes without filters, un-cached query embeddings, high k without reranking, ignoring OCU costs for batch ingestion, and non-idempotent ingestion pipelines. Security mandates KMS CMKs, least privilege IAM roles, VPC endpoints, and query auditing. Observability requires monitoring infrastructure, application traces, and offline retrieval quality metrics.