ADR: Adopting Amazon Bedrock A... Note

ADR: Adopting Amazon Bedrock AgentCore in Production

The author, an AWS financial platform architect, details the decision-making process behind adopting Amazon Bedrock AgentCore for operationalizing AI agents in a regulated financial environment. Traditional approaches struggled with critical operational concerns like 2 AM failures and regulatory compliance. Five key forces necessitated an urgent solution: managing cross-turn state, ensuring regulatory traceability, implementing robust guardrails, controlling unpredictable token costs, and achieving runtime portability. Several options were considered, including self-hosted solutions on EKS, previous generations of Bedrock Agents, and using Step Functions with Lambda.The self-hosted EKS option was dismissed due to high operational responsibility and engineering cost. The prior Bedrock Agents generation was deemed insufficient due to limited observability and budget control. Step Functions was deemed inadequate as a conversational agent runtime despite its strengths in deterministic workflows. Amazon Bedrock AgentCore emerged as the recommended solution, offering a managed runtime with native features for session memory, guardrails, traceability, and tool-use.The decisive factors for choosing AgentCore were its Gateway with per-tool OAuth2/OIDC support and its managed session memory with configurable TTL, crucial for security and compliance in finance. The author acknowledges the trade-off of platform lock-in for the runtime but emphasizes the portability of the underlying tools. The article provides concrete configuration advice for Guardrails, AgentCore Memory, the Gateway, and token budgets, highlighting their importance for effective and secure operation. Observability metrics like TurnsPerSession, TokensPerSession, ToolCallFailureRate, and GuardrailInterventionRate are outlined, along with leveraging X-Ray and CloudTrail for detailed tracing and regulatory audit. The author also warns of consequences and risks, including runtime lock-in, conservative quotas, guardrail latency, and memory costs, urging careful acceptance and mitigation strategies.