Cloud Blog: Write better log queries, faster: Introducing pipe syntax in BigQuery and Cloud Logging

Source URL: https://cloud.google.com/blog/products/data-analytics/introducing-pipe-syntax-in-bigquery-and-cloud-logging/
Source: Cloud Blog
Title: Write better log queries, faster: Introducing pipe syntax in BigQuery and Cloud Logging

Feedly Summary: Log data has become an invaluable resource for organizations seeking to understand application behavior, optimize performance, strengthen security, and enhance user experiences. But the sheer volume and complexity of logs generated by modern applications can feel overwhelming. How do you extract meaningful insights from this sea of data?
At Google Cloud, we’re committed to providing you with the most powerful and intuitive tools to unlock the full potential of your log data. That’s why we’re thrilled to announce a series of innovations in BigQuery and Cloud Logging designed to revolutionize the way you manage, analyze, and derive value from your logs.
BigQuery pipe syntax: Reimagine SQL for log data
Say goodbye to the days of deciphering complex, nested SQL queries. BigQuery pipe syntax ushers in a new era of SQL, specifically designed with the semi-structured nature of log data in mind. BigQuery’s pipe syntax introduces an intuitive, top-down syntax that mirrors how you naturally approach data transformations. As demonstrated in the recent research by Google, this approach leads to significant improvements in query readability and writability. By visually separating different stages of a query with the pipe symbol (|>), it becomes remarkably easy to understand the logical flow of data transformation. Each step is clear, concise, and self-contained, making your queries more approachable for both you and your team.
BigQuery’s pipe syntax isn’t just about cleaner SQL — it’s about unlocking a more intuitive and efficient way to work with your data. Instead of wrestling with code, experience faster insights, improved collaboration, and more time spent extracting value.
This streamlined approach is especially powerful when it comes to the world of log analysis. 
With log analysis, exploration is key. Log analysis is rarely a straight line from question to answer. Analyzing logs often means sifting through mountains of data to find specific events or patterns. You explore, you discover, and you refine your approach as you go. Pipe syntax embraces this iterative approach. You can smoothly chain together filters (WHERE), aggregations (COUNT), and sorting (ORDER BY) to extract those golden insights. You can also add or remove steps in your data processing as you uncover new insights, easily adjusting your analysis on the fly.
Imagine you want to count the total number of users who were affected by the same errors more than 100 times in the month of January. As shown below, the pipe syntax’s linear structure clearly shows the data flowing through each transformation: starting from the table, filtering by the dates, counting by user id and error type, filtering for errors >100, and finally counting the number of users affected by the same errors.

code_block
WHERE datetime BETWEEN DATETIME ‘2024-01-01’ AND ‘2024-01-31’\r\n|> AGGREGATE COUNT(log_id) AS error_count GROUP BY user_id, error_type\r\n|> WHERE error_count>100\r\n|> AGGREGATE COUNT(user_id) AS user_count GROUP BY"), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef2d3aab310>)])]>

The same example in the standard syntax will typically require using a subquery and non linear structure.

code_block
<ListValue: [StructValue([(‘code’, "– Standard Syntax \r\nSELECT error_type, COUNT(user_id)\r\nFROM (\r\n SELECT user_id, error_type, \r\n count (log_id) AS error_count \r\n FROM log_table \r\n WHERE datetime BETWEEN DATETIME ‘2024-01-01’ AND DATETIME ‘2024-01-31’\r\n GROUP BY user_id, error_type\r\n)\r\nGROUP BY error_type\r\nWHERE error_count > 100;"), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef2d3aabcd0>)])]>

Carrefour: A customer’s perspective
The impact of these advancements is already being felt by our customers. Here’s what Carrefour, a global leader in retail, had to say about their experience with pipe syntax:
 "Pipe syntax has been a very refreshing addition to BigQuery. We started using it to dig into our audit logs, where we often use Common Table Expressions (CTEs) and aggregations. With pipe syntax, we can filter and aggregate data on the fly by just adding more pipes to the same query. This iterative approach is very intuitive and natural to read and write. We are now using it for our analysis work in every business domain. We will have a hard time going back to the old SQL syntax now!" – Axel Thevenot, Lead Data Engineer, and Guillaume Blaquiere, Data Architect, Carrefour
BigQuery pipe syntax is currently available in private preview.  To sign up for a private preview please use this form and also check-out this introductory video.
Beyond syntax: performance and flexibility
But we haven’t stopped at simplifying your code. BigQuery now offers enhanced performance and powerful JSON handling capabilities to further accelerate your log analytics workflows. Given the prevalence of json data in logs, we expect these changes to simplify log analytics for a majority of users. 

Enhanced Point Lookups: Pinpoint critical events in massive datasets quickly using BigQuery’s numeric search indexes, which dramatically accelerates queries that filter on timestamps and unique IDs. Here is a sample improvement from the announcement blog. 

Metrics 

Without Index

With Index

Improvement

Execution Time (ms)

48,790

4,664

10x

Processed Bytes

2,174,758,158,336

774,897,664

2,806x

Slot Usage (ms)

25,735,222

7,300

3,525x

Powerful JSON Analysis: Parse and analyze your JSON-formatted log data with ease using BigQuery’s JSON_KEYS function and JSONPath traversal feature. Extract specific fields, filter on nested values, and navigate complex JSON structures without breaking a sweat.

JSON_KEYS extracts unique JSON keys from JSON data for easier schema exploration and discoverability 

Query 

Results 

JSON_KEYS(JSON ‘{"a":{"b":1}}’)

["a", "a.b"]

JSON_KEYS(JSON ‘{"a":[{"b":1}, {"c":2}]}’, mode => "lax")

["a", "a.b", "a.c"]

JSON_KEYS(JSON ‘[[{"a":1},{"b":2}]]’, mode => "lax recursive")

["a", "b"]

JSONPath with LAX modes lets you easily fetch JSON arrays without having to use verbose UNNEST. The example below shows how to fetch all phone numbers from the person field, before and after:

code_block
<ListValue: [StructValue([(‘code’, ‘– consider a JSON field ‘Person’ as\r\n[{\r\n "name": "Bob",\r\n "phone":[{"type": "home", "number": 20}, {"number":30}]\r\n}]\r\n\r\n–Previously, to fetch all phone numbers from ‘Person’ column\r\nSELECT phone.number\r\nFROM (\r\nSELECT IF(JSON_TYPE(person.phone) = "array", JSON_QUERY_ARRAY (person.phone), [person.phone]) as nested_phone\r\nFrom (\r\nSELECT IF(JSON_TYPE(person)= "array", JSON_QUERY_ARRAY(person), [person])as nested_person\r\nFROM t), UNNEST(nested_person) person), UNNEST (nested_phone)phone\r\n\r\n–With Lax Mode\r\nSELECT JSON_QUERY(person, "lax recursive $.phone.number") FROM t’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef2d3aabd90>)])]>

Log Analytics in Cloud Logging: Bringing it all together
Log Analytics in Cloud Logging is built on top of BigQuery and provides a UI that’s purpose-built for log analysis. With an integrated date/time picker, charting and dashboarding, Log Analytics makes use of the JSON capabilities to support advanced queries and analyze logs faster. To seamlessly integrate these powerful capabilities into your log management workflow, we’re also enhancing Log Analytics (in Cloud Logging) with pipe syntax. You can now analyze your logs within Log Analytics leveraging the full power of BigQuery pipe syntax, enhanced lookups, and JSON handling, all within a unified platform.

Use of pipe syntax in Log Analytics (Cloud Logging) is now available in preview.
Unlock the future of log analytics today
BigQuery and Cloud Logging provide an unmatched solution for managing, analyzing, and extracting actionable insights from your log data. Explore these new capabilities today and experience the power of:

Intuitive querying with pipe syntax – Introductory video,  Documentation

Unified log management and analysis with Log Analytics in Cloud Logging 

Blazing-fast lookups with numeric search indexes – Documentation 

Seamless JSON analysis with JSON_KEYS and JSON_PATH

Start your journey towards more insightful and efficient log analytics in the cloud with BigQuery and Cloud Logging. Your data holds the answers — we’re here to help you find them.

AI Summary and Description: Yes

Summary: The text details innovations in Google Cloud’s BigQuery and Cloud Logging aimed at enhancing log data management and analysis. It introduces a new pipe syntax designed to simplify SQL for log data, making it more intuitive and efficient. The developments promise significant improvements in query performance and JSON handling capabilities, which can optimize security and operational efficiencies.

Detailed Description:
The content discusses several significant advancements in Google Cloud’s capabilities regarding log data analysis, specifically focusing on BigQuery and Cloud Logging. These innovations include:

– **BigQuery Pipe Syntax**:
– A new syntax tailored for semi-structured log data, designed to improve the way users write SQL queries.
– It allows for a clearer approach to data transformations, enhancing the readability and writability of queries.
– Facilitates iterative data exploration, enabling users to chain filters, aggregations, and sorting steps effortlessly.

– **Enhanced Performance Features**:
– **Point Lookups**: The introduction of numeric search indexes allows for faster pinpointing of critical events across vast datasets.
– Considerable improvements in execution time and processed bytes lead to enhanced efficiency when querying large volumes of log data.

– **Powerful JSON Handling**:
– Enhanced functions such as JSON_KEYS and JSONPath traversal features allow users to extract fields and analyze complex JSON structures quickly.
– This capability is crucial for organizations that deal with large amounts of JSON-formatted logs, streamlining the process of identifying relevant data points.

– **Log Analytics Integration**:
– The integration of enhanced features into Log Analytics (part of Cloud Logging) provides a unified interface for users to analyze log data.
– Features such as an integrated date/time picker and charting tools aid in enhancing data insights visually.

– **Customer Perspectives**:
– Feedback from Carrefour illustrates the positive impact of the new pipe syntax on log analysis, emphasizing its intuitive nature and collaborative benefits.

Overall Significance:
– These innovations cater to professionals in fields such as cloud security and infrastructure management, as improved log analysis tools can assist in detecting security anomalies, optimizing performance, and ensuring compliance with governance measures.
– The ability to manage and analyze log data more efficiently can lead to better decision-making, enhanced risk management, and a more robust security posture, crucial aspects for organizations operating in increasingly complex digital landscapes.

In conclusion, Google Cloud’s updates to BigQuery and Cloud Logging not only streamline the querying process but also enhance the tools available for analyzing log data, contributing to more secure and compliant operational practices in cloud environments.