Ingest data with Azure Synapse Link

Follow the steps in this article to bring your organization’s environmental data from the Sustainability Manager data model in Dataverse to the IngestedRawData lakehouse in your Fabric workspace by using Azure Synapse Link.

Prerequisites

  • Deploy the Environmental data and insights capability in Fabric workspace.
  • Ensure that Microsoft Sustainability Manager is installed and set up in your Dataverse environment.
  • Ensure you have the Dataverse system administrator security role.
  • Deploy the Environmental data and insights capability in your Fabric workspace.
  • Ensure you have an Azure Data Lake Storage Gen2 storage account.
  • Ensure you have the Dataverse system administrator security role.
  • Ensure you have an Azure Data Lake Storage Gen2 account and Owner and Storage Blob Data Contributor role access. Your storage account must enable hierarchical namespace for both initial setup and delta sync. Allow storage account key access is required only for the initial setup.

In this step, you set up an Azure Synapse Link for the Sustainability Manager environment from which you want to bring data into the Sustainability data solutions in Fabric workspace.

  1. Open the Power Apps portal. On the top right, select your Dataverse environment where Sustainability Manager is installed.

  2. Select Azure Synapse Link from the left pane, and then select New link. If it doesn't already appear, select More, select Discover all, select Data management, and then select Azure Synapse Link.

    Screenshot of selecting Azure Synapse Link.

  3. Leave Connect to your Azure Synapse Analytics workspace unchecked. You run the integration and transformation scripts on Fabric, not Synapse.

  4. Enter your storage account and subscription details and select Next. You can only select a storage account located in the same location as your environment. For example, if your environment is in the West US, your storage account should be in West US or West US2.

    Screenshot of setting up new link.

  5. Select all your Sustainability Manager tables from the list and select Save. You can filter by msdyn_ to view the relevant tables.

    An Azure Synapse Link to your storage account is created, and the selected tables are exported to the storage account. You can add or remove the tables in the link by going back to Azure Synapse Link, selecting the link you created, and then selecting Manage tables.

    Screenshot of managing tables.

  6. Open your Data Lake Storage Gen2 account provided while creating the link in prior steps. A new container starting with Dataverse- is created in your storage account with all the selected tables exported in folder structure in CSV format along with a model.json file that contains the schema for all the tables. Your Sustainability Manager data is now available in your own storage account.

In this step, you link the Data Lake Storage container with Sustainability Manager data to the IngestedRawData lakehouse of the deployed capability using the Fabric shortcut functionality.

  1. Open the workspace in Fabric in which you deployed the Sustainability solution. To open the solution, select it from the list view. Select the environmental, social, and governance (ESG) data estate capability from the Manage deployed capabilities list view to reach managed capability page.

  2. From the ESG data estate managed capability page, select the IngestedRawData lakehouse.

    Screenshot of selecting lakehouse.

  3. Select New shortcut from the ellipses menu of Files. In the New shortcut dialog box, select the Azure Data Lake Storage Gen2 tile, and then provide the following connection details to connect to the Data Lake Storage container that has the exported Sustainability Manager data.

    • URL: Go to the Data Lake Storage account. On the left side, select Endpoint. Then copy the end point for Data Lake Storage and paste it in the URL field.
    • Connection: Select Create new connection.
    • Authentication kind: Select Organization account, sign in to your Data Lake Storage account, and then select Next.

    Screenshot of connection settings.

  4. For the shortcut name, enter MSMData, set the sub path as the container name that contains the Sustainability Manager data prefixed with /, and then select Create. The Sustainability Manager data is successfully integrated with the data lake under Files.

Step 3: Transform data

After the Sustainability Manager data integrates into the IngestedRawData lakehouse, you need to unify and harmonize the data by transforming it into the ESG data model schema. The capability deployment created the ProcessedESGData lakehouse in the workspace for storing the harmonized data.

The ESG data model schema links the ESG data entities to other data entities that describe other functional areas of a company, such as HR, IT, or finance production.

For transforming Sustainability Manager data to the ESG data model schema, the capability provides notebooks and data pipelines that use prebuilt transformation and orchestration technologies using libraries stored in the ConfigAndDemoData lakehouse in your workspace.

Activate the first step and run the TransformMSMDataToProcessedESGData_DTPL data pipeline to load and transform the Sustainability Manager data to the ESG data model schema in the ProcessedESGData lakehouse.

Screenshot of transforming data.

The data pipeline runs these steps:

  1. This step is deactivated by default and needs to be activated before running the pipeline. It loads the linked Sustainability Manager data as lakehouse tables. The transformation libraries require the source data to be in delta table format. The LoadMSMDataToLakehouseTables_INTB notebook loads the linked Sustainability Manager data to the lakehouse tables. After this notebook executes successfully, you can validate the data in the IngestedRawData lakehouse tables.

  2. The transformation of the Sustainability Manager data to the ESG data model schema is a two-stage process:

  • First, the TransformMSMDataToProcessedESGData_INTB notebook transforms the Microsoft Sustainability Manager data to the intermediary raw data import tables.
  • Then, the TransformRawImportESGDataToProcessedESGData_INTB notebook transforms the data from raw data import tables to the ESG data model schema.