Cloud Blog: Building a real-time analytics platform using BigQuery and Bigtable

Source URL: https://cloud.google.com/blog/products/databases/using-reverse-etl-between-bigtable-and-bigquery/
Source: Cloud Blog
Title: Building a real-time analytics platform using BigQuery and Bigtable

Feedly Summary: When developing a real-time architecture, there are two fundamental questions that you need to ask yourself in order to make the right technology choice:

Freshness – how fast does the data need to be available?

Query latency – how fast do you need to be able to retrieve the data once it is available? 

For the past ten years, Bigtable has served as critical infrastructure to support both of these questions for large-scale, low-latency use cases such as fraud detection, data mesh, recommendation engines, clickstream, ads serving, internet-of-things insights, and a variety of other scenarios where having access to current data is essential for operational workflows. 
At the same time, increased demand for AI and real-time data integration in today’s applications has thrust data analytics platforms like BigQuery into the heart of operational systems, blurring the traditional boundaries between databases and analytics. Customers tell us that they love BigQuery for easily integrating multiple data sources, enhancing their data with AI and ML, and even using data science tools such as Pandas to directly manipulate data in the warehouse. However, they also tell us that they still need to make pre-processed data in BigQuery available for fast retrieval in an operational system that can handle large-scale datasets with millisecond query performance. 
To address this need, the EXPORT DATA to Bigtable (reverse ETL) function is now generally available, helping bridge the gap between analytics and operational systems, while providing the query latency that real-time systems need. Now, anyone that can write SQL can easily transform their BigQuery analysis into Bigtable’s highly performant data model and retrieve it with single-digit millisecond latency, high QPS, and replicate it throughout the world so it can be closer to users. You can also query the Bigtable data using the same SQL dialect that BigQuery uses, in preview. 
In this blog, we explain three use cases and architectures that can benefit from automated on-demand data exports from BigQuery to Bigtable:

Real-time application serving 

Enriched streaming data for ML

Backloading data sketches to build real-time metrics that rely on big data 

1. Real-time application serving 
Bigtable complements existing BigQuery features for building real-time applications. BigQuery’s storage format is optimized for OLAP (analytics) queries such as counting and aggregations. If your real-time application requires this type of ad-hoc analysis, BigQuery BI Engine can accelerate common queries by intelligently caching your most frequently used data. Additionally, if you need to identify a specific row (or rows) that is not based on keys but requires text filtering, including JSON, BigQuery search indexes can address these text lookups. 
While BigQuery is a versatile analytics platform, it is not optimized for the real-time application serving that Bigtable is designed to deliver. OLAP-based storage isn’t always ideal for quickly accessing multiple columns within a specific row or range of rows. This is where Bigtable’s data storage model shines, making it well-suited for powering operational applications.

The above diagram highlights how real-time features across BigQuery and Bigtable can unite to deliver a full real-time application.

Consider using Bigtable as a serving layer when your application requires one or more of the following:

Single-digit millisecond latency on row lookups with consistent and predictable response times

Very high queries per second (QPS scales linearly with number of nodes)

Low-latency writes in the application

Global deployments (automatically replicating the data near the users)

An example
Picture this: You’re standing in the Metropolitan Museum of Art, surrounded by magnificent Greek and Roman sculptures. Who is that a statue of, and what are they doing? You pull out your phone, open the museum’s app, and point it at the statue. Instantly, a wealth of information appears on your screen — not just dry facts, but a vivid narrative that brings the artwork to life.
That’s the experience we’re crafting. But behind the scenes, delivering it is a huge technical challenge: BigQuery, our data warehouse, holds all the structured details about the artwork. However, the museum goer doesn’t want to stand around for a long time while the system queries and returns the data. Plus, we want to provide engaging text descriptions, even though our analytics platform is geared towards structured data.
Here’s where reverse ETL and Bigtable come in. Reverse ETL allows you to seamlessly move data from warehouse tables to a real-time architecture, reducing query latency. Let’s break down how to create this immersive museum app. Please note that Reverse ETL does require either a BigQuery Enterprise or Enterprise Plus edition. 
Step 1: Create a Bigtable instance and serving tableIf you are not familiar with Bigtable, it might be worth a quick glance at this page to understand the concepts of instances, clusters, and nodes. 
The first thing you need to do is follow the instructions to create a Bigtable instance, which is a container for your Bigtable data. When you create this instance, you will need to choose the storage type of either SSD or HDD. Generally, SSD is much more performant and the right choice for production environments, but HDD can help keep costs down if you are just looking to educate yourself on how Bigtable works. While creating an instance, you also create your first cluster within the instance. This cluster needs to be in the same region as the BigQuery dataset from which you intended to load the data. However, you do have the flexibility to add clusters in other regions, which automatically receive a copy of the data from the cluster that BigQuery uses for writing data. 
Once your Bigtable instance and cluster is ready to go, it’s time to create your Bigtable table, which is used as the BigQuery sink in the reverse ETL process. From the console, choose Tables in the left navigation panel and then choose Create Table from the top of the Tables screen. 
In the Create a Table screen, the only thing you need to do is name the Table ID BQ_SINK and click create. We let the BigQuery Reverse ETL process create the necessary column families for us in the third step. 
Alternatively, you can also use the CLI to connect to your instance and run the command cbt createtable BQ-SINK.
Step 2: Create an application profile for the BigQuery Reverse ETL jobBigtable uses app profiles to control how an instance handles incoming requests. It’s a good practice to isolate your BigQuery data export process within its own app profile. Enable single-cluster routing within this profile to ensure your data lands in the same region as your BigQuery dataset. Additionally, set it to low priority to prevent disruptions to your primary Bigtable application traffic.
Here’s a gcloud command to create a Bigtable App Profile with these settings:

code_block