CloudWatch to OTel: Tearing Do... Note

CloudWatch to OTel: Tearing Down the Observability Bridge Pattern

The CloudWatch to OpenTelemetry bridge pattern is a common solution for financial platforms needing to consolidate AWS metrics into unified observability backends. Its primary purpose is to resolve fragmented observability by forwarding metrics from AWS services to an external system that speaks OpenTelemetry. This pattern typically involves AWS services emitting metrics to CloudWatch, which are then captured by CloudWatch Metric Streams or retrieved via the GetMetricData API. These metrics are transformed into OpenTelemetry format, often by a Lambda function or an OpenTelemetry Collector, before being sent to an observability backend like Datadog or Grafana. However, this "obvious" solution carries hidden complexities, especially in high-volume financial environments where metric freshness and API costs are critical. The pattern has two primary ingestion paths: CloudWatch Metric Streams with Kinesis Firehose offers low latency and predictable costs, while polling with GetMetricData provides fine-grained control but risks hitting API rate limits and incurring high costs. A robust transformation Lambda must be idempotent, include a circuit breaker for the external endpoint, and feature a Dead Letter Queue with alarms. This pattern is suitable when an external OTLP-speaking backend is required, metric volume justifies the stream cost, and a metric freshness SLO of 60 seconds or less is needed. It also helps decouple observability from AWS and enables trace-metric correlation. Security is paramount, requiring strict IAM policies with resource conditions, KMS encryption for data in transit, and controlled network egress. Anti-patterns to avoid include naive polling, Lambda functions without proper error handling or reserved concurrency, and attempting to forward logs or traces through this metric-focused bridge. A well-architected implementation prioritizes filtering metrics at the stream level, uses efficient buffering, instruments the bridge itself for observability, and employs strict security measures.