Cloud Blog

From data lakes to user applications: How Bigtable works with Apache Iceberg

The Bigtable Spark connector allows direct interaction with Bigtable data from Apache Spark, enabling powerful use cases that leverage Apache Iceberg. The connector enables reading and writing Bigtable data using Apache Spark in Scala, SparkSQL, and DataFrames. This integration provides direct access to operational data for building data pipelines that support training ML models, ETL/ELT, or generating real-time dashboards. The connector also supports query optimizations such as join pushdowns and dynamic column filtering. This opens up Bigtable and Apache Iceberg integrations for accelerated data science, low-latency serving, and other use cases. Data scientists can directly interact with Bigtable's operational data within their Apache Spark environments, streamlining data preparation, exploration, analysis, and creation of Iceberg tables. The connector also enables low-latency serving by supporting write-back capabilities, making real-time updates to Bigtable possible. To get started, users need to add the Bigtable Spark connector dependency to their Apache Spark instance and create a mapping between the Spark data format and Bigtable data formats using JSON. The connector can be used for various use cases, such as tracking vehicle telemetry, and can be combined with Bigtable Data Boost to enable high-throughput read jobs on operational data without impacting Bigtable application performance.
favicon
cloud.google.com
cloud.google.com
Image for the article: From data lakes to user applications: How Bigtable works with Apache Iceberg