Data quality for Microsoft Fabric mirrored databases

As a data replication solution, mirroring in Fabric is a low-cost and low-latency solution to bring data from various systems together into a single analytics platform. You can continuously replicate your existing data estate directly into Fabric's OneLake, including data from Azure SQL Database, Azure Cosmos DB, and Snowflake.

Mirroring in Fabric allows users to enjoy an end-to-end product that is designed to simplify your analytics needs. Built for openness and collaboration between Microsoft and technology solutions that can read the open-source delta lake table format, mirroring is a low-cost, and low-latency solution that allows you to create a replica of your data in OneLake, which can be used for all your analytical needs. For more details about Fabric mirroring review the Fabric documentation.

Configure data quality for a Fabric mirrored database

  1. Enable mirroring in your Fabric tenant. Power BI administrators can enable or disable Mirroring for the entire organization or for specific security groups, using the setting found in the Power BI admin portal. Mirroring is enabled by creating a secure connection to your operational data source. You choose whether to replicate an entire database or individual tables and mirroring will automatically keep your data in sync. Once set up, data will continuously replicate into the OneLake for analytics consumption.

  2. After enabled mirroring and initiated replication, confirm that mirroring replication successfully completes.

  3. Open the SQL analytics endpoint.

    Screenshot to navigate sql end point.

  4. In this page, go to the Reporting tab and select Automatically update semantic model.

    Automatically update semantic model.

  5. Go to the Microsoft Purview Data Map page and scan the data source. Use service principal authentication.

    Use service principal for datamap scan.

  6. When the scan is complete, associate the new data assets with a data product for curation and data quality assessment.

  7. Open the Microsoft Purview Data Quality solution and run a data quality scan or profile your data as usual.

Important

  • Use service principals for data map scans, and a managed identity for data quality scans.
  • If your mirrored database tables are not available in the Fabric Lakehouse then contact with Fabric support.
  • Data Quality scanning is supported only for Lakehouse delta tables and parquet files.
  • The metadata harvest in Purview for Fabric Lakehouse subartifacts is an enhancement based on the metadata harvest for Fabric which was released in December 2023. This feature is at private preview stage.
  • There is a dependency on Fabric team to differentiate shortcut items from native items in the OneLake SDK for Lakehouse subartifacts. For now all shortcut items (tables and files) will be considered as native items in scanning. You need to allowlist your tenant to enable fabric lakehouse data DQ assessment.