Cloud Blog: New BigQuery capabilities for data and AI governance

Source URL: https://cloud.google.com/blog/products/data-analytics/how-dataplex-provides-data-governance-for-the-ai-era/
Source: Cloud Blog
Title: New BigQuery capabilities for data and AI governance

Feedly Summary: Across industries and disciplines, generative AI is transforming the way we work, from sparking new forms of creativity and revolutionizing customer experiences to unlocking hidden insights within complex data. At the same time, this revolution hinges on high-quality, well-governed, and accessible data.
Data may be the foundation of training and grounding AI models, but for decades, governance of that data has been an afterthought in the enterprise. With the rise of AI, however, it is now front and center of enterprises’ data strategies, even as they struggle to discover, govern, and understand their distributed data assets. In fact, 66% of organizations report that at least half their data remains unused or undiscovered, while only 44% of data leaders fully trust the quality of their organization’s data. As a result, poorly managed data leads to flawed AI and unreliable insights, hindering effective decision-making.
These are precisely the challenges that Dataplex is designed to address. Dataplex is the unified governance foundation for the entire BigQuery platform, providing automated data discovery, curation and management at scale. More importantly, Dataplex minimizes tedious, error-prone and manual governance processes, instead making them pervasive, contextual and always-on. By deeply integrating with Google Cloud services, Dataplex creates a unified inventory of metadata across projects, regions and storage systems. This comprehensive view empowers users to perform global search over distributed data, enrich and organize that data, manage governance policies effectively, and maintain strong security, all while fostering data democratization. Moreover, Dataplex offers a variety of intelligent data management capabilities, including lineage tracking, data profiling and automated quality checks to help users build trust in their data and maximize data-related ROI. As a result, Dataplex has been widely adopted since its launch in 2022, with over 95% of top Google Cloud data analytics customers using it for their data management and governance needs.
Cloud content management provider Box.Inc uses Dataplex as its go-to tool for enhanced data governance, discovery and observability.
“Leveraging Dataplex, we embarked on a transformative journey to enhance our Data Platform by enhancing developer efficiency while tightening security policies across all regions. Dataplex serves as our central data catalog, providing data discovery, lineage tracking, and governance capabilities.” – Yeshvant Kumar Bhavnasi Venkat Satya, Senior Software Engineer and Asmita Kulkarni, Senior Product Manager, Box.Inc. 
This year, we’ve supercharged Dataplex with powerful new features to help you navigate the complexities of data in the era of generative AI. Read on to learn more about Dataplex’s newest features and how they position you to take the most advantage of generative AI with full confidence in the quality of your data assets.
1. Automated cataloging: Discover your data and AI assets in a unified way
Dataplex automatically harvests, ingests, and indexes metadata from across your data estate. In addition to data assets in BigQuery, Pub/Sub and Cloud Storage, we’ve extended Dataplex’s automated cataloging capability to the following sources recently:

Vertex AI: Models, datasets, and features from Vertex AI are now cataloged in Dataplex in near real-time, providing a coherent view of your data and AI assets.

Operational databases: Cloud SQL, Spanner, and Bigtable assets are now automatically cataloged, providing end-to-end visibility of your data landscape that spans the entire lifecycle.

Looker: A preview of managed cataloging for Looker assets is coming soon, allowing you to discover and manage your BI assets alongside data and AI resources.

With this comprehensive inventory in place, you can easily search, organize, and enrich your data and AI assets, establishing the critical metadata foundation for effective data-to-AI governance.
2. Enhanced lineage tracking: Understand your data’s end-to-end journey
Dataplex automatically captures the complete lineage of your data, allowing you to trace its origins, transformations, and destinations across your entire data landscape. This comprehensive view is now even more powerful with the following latest enhancements:

Lineage for Vertex AI Pipelines: In addition to native integration with BigQuery, Dataproc and Composer, Dataplex is now integrated with Vertex AI Pipelines. This enables traceability of data from processing and analytics through to AI model training and deployment — essential for responsible AI governance and regulatory compliance.

Column-level lineage for BigQuery: You can now dive deeper into your data with field-level lineage tracking in BigQuery. This granular view enables precise impact and root-cause analysis, facilitates the management of sensitive data, and helps ensure compliance with data privacy regulations.

3. Intelligent search: Find what you need, faster
Finding the right data quickly is essential for any data-driven organization. Dataplex has been providing global, governed catalog search capabilities, and now we’re taking data discovery to the next level:

Semantic search: Ask questions in natural language and Dataplex will understand your intent to retrieve the most relevant results, with the upcoming semantic search capability. This makes it much easier for everyone in your organization to find the data they need, regardless of their role or technical expertise.

Full catalog search in BigQuery: We will also launch full catalog search in BigQuery soon, enabling users to search the entire catalog and discover data and AI resources directly within the familiar BigQuery interface.

4. AI-powered data insights: Jumpstart your analysis
Once relevant data is discovered, Dataplex can help you overcome the “cold start" problem with Data Insights. This feature automatically generates suggested questions and validated SQL queries for your data, jumpstarting your analysis and accelerating your time to insight. This capability helps users of all skill levels quickly uncover insights without writing a line of code, and is an efficiency multiplier for expert users to customize generated queries for deeper analysis.
5. Governance rules: Enforce metadata-driven policies at scale
Unified metadata is the foundation of Dataplex. In addition to leveraging metadata for search and discovery, we are launching Dataplex governance rules in preview, allowing you to define and enforce governance policies based on metadata. You can use Dataplex’s search capabilities to pinpoint the data assets or specific fields that need to be governed, and easily create governance rules based on your specific requirements and policies. Dataplex then automatically applies and enforces these rules across your distributed data environment, with built-in monitoring to ensure compliance.
This centralized approach simplifies governance management, reduces security risks, and provides a unified control plane for all your data. Our initial private preview focuses on fine-grained access control, allowing you to efficiently manage access policies across BigQuery and Cloud Storage at scale.
With these new innovations, Dataplex empowers you to navigate the complexities of the data landscape and unlock the full potential of your data in the age of generative AI. Discover, govern, understand, and activate your data to drive innovation and transform your organization. Learn more about Dataplex and begin your data-driven journey today.

AI Summary and Description: Yes

Summary: The text discusses the transformative impact of generative AI across various sectors, emphasizing the importance of data governance and automation through Google Cloud’s Dataplex. Dataplex streamlines data discovery, management, and compliance while enhancing AI model training and operational efficiencies.

Detailed Description:
The emergence of generative AI is revolutionizing multiple industries by enabling fresh creativity and improving decision-making processes through data insights. However, the effective use of AI hinges on robust data governance—a facet that has historically been overlooked but is now critical for enterprises aiming to leverage their data assets effectively. Dataplex from Google Cloud addresses these challenges by providing a comprehensive solution for data management and governance.

Key Points:

– **Importance of Data Governance:**
– Data underpins AI training and performance.
– Poorly governed data leads to unreliable AI outcomes.
– 66% of organizations have significant amounts of unused data.
– Only 44% of data leaders trust their organization’s data quality.

– **Features of Dataplex:**
– **Automated Cataloging:** Streamlines metadata ingestion from various sources (BigQuery, Vertex AI, Cloud SQL, etc.), enhancing visibility across data assets.
– **Enhanced Lineage Tracking:** Offers complete data lineage tracking, particularly for AI trainings and regulatory compliance, improving data understanding and management.
– **Intelligent Search Capabilities:** Introduces semantic search for natural language queries, facilitating easier data discovery for users with varying technical backgrounds.
– **AI-Powered Insights:** Assists users in rapidly generating insights through suggested questions and SQL queries, effectively reducing analysis time.
– **Governance Rules:** Enables the definition and enforcement of metadata-driven governance policies, improving security and compliance through unified access controls.

– **Practical Implications:**
– Organizations can adopt Dataplex to ensure data quality, streamline processes and mitigate risks associated with poorly managed data environments.
– Enhanced data governance practices can drive innovation by enabling better data utilization and AI integration.
– Data democratization fosters a culture of informed decision-making across all levels of the organization.

Dataplex positions itself as a vital tool for enterprises navigating the complexities of data management in an AI-driven world. The new features enrich data governance options, empower users through smart insights, and reinforce the significance of skilled data management practices integral to successful AI deployment and compliance adherence.