Data and AI governance for the data lakehouse

The architectural principles of the data and AI governance pillar cover how to centrally manage assets and access.

Data governance lakehouse architecture diagram for Databricks.

Principles of data and AI governance

  1. Unify data and AI management

    Data and AI management is the foundation for executing the data and AI governance strategy. It involves the collection, integration, organization, and persistence of trusted data assets to help organizations maximize their value. A unified catalog centrally and consistently stores all your data and analytical artifacts, as well as the metadata associated with each data object. It enables end users to discover the data sets available to them and provides provenance visibility by tracking the lineage of all data assets.

  2. Unify data and AI security

    There are two tenets of effective data security governance: understanding who has access to what data, and who has recently accessed what data assets. This information is critical for almost all compliance requirements for regulated industries and is fundamental to any security governance program. With a unified data security system, the permissions model can be centrally and consistently managed across all data assets. Data access is centrally audited with alerting and monitoring capabilities to promote accountability.

  3. Establish data quality standards

    Data quality is fundamental to deriving accurate and meaningful insights from data. Data quality has many dimensions, including completeness, accuracy, validity, and consistency. It must be actively managed to improve the quality of the final data sets so that the data serves as reliable and trustworthy information for business users.

Next: Best practices for data governance

See Best practices for data and AI governance.