Stop Loading Everything into R... Note
DZone.com

Stop Loading Everything into Redshift: A Spectrum + Iceberg Pattern for Hybrid Analytics

Not every dataset belongs fully inside the warehouse. A hybrid design using Apache Iceberg on S3, Redshift Spectrum, and Redshift local tables can reduce duplicated storage and reserve warehouse performance for the workloads that need it. The Warehouse Became the Second Data Lake Redshift clusters routinely carry tables that should not be there. A five-year transaction history is loaded nightly through a four-hour COPY job and queried twice a quarter. Raw event tables landed directly into the warehouse because the lake pipeline was harder to set up. Aggregations that nobody owns, kept around because deleting them feels risky.