What is semantic link?

Semantic link is a feature that allows you to establish a connection between semantic models and Synapse Data Science in Microsoft Fabric. Use of semantic link is supported only in Microsoft Fabric.

  • For Spark 3.4 and above, semantic link is available in the default runtime when using Fabric, and there's no need to install it.

  • For Spark 3.3 or below, or to update to the latest version of semantic link, run the following command:

    %pip install -U semantic-link
    

The primary goals of semantic link are to:

  • Facilitate data connectivity.
  • Enable the propagation of semantic information.
  • Seamlessly integrate with established tools data scientists use, such as notebooks.

Semantic link helps you preserve domain knowledge about data semantics in a standardized way that can speed up data analysis and reduce errors.

The semantic link data flow starts with semantic models that contain data and semantic information. Semantic link bridges the gap between Power BI and the Synapse Data Science experience.

A diagram that shows data flow from Power BI to notebooks in Synapse Data Science and back to Power BI.

Semantic link allows you to use semantic models from Power BI in the Synapse Data Science experience to perform tasks such as in-depth statistical analysis and predictive modeling with machine learning techniques. You can store the output of your data science work into OneLake by using Apache Spark, and ingest the stored output into Power BI by using Direct Lake.

Power BI connectivity

A semantic model serves as a single tabular object model that provides reliable sources for semantic definitions such as Power BI measures. Semantic link connects to semantic models in the following ecosystems, making it easy for data scientists to work in the system they're most familiar with.

  • Python pandas ecosystem, through the SemPy Python library.
  • Apache Spark ecosystem, through the Spark native connector. This implementation supports various languages, including PySpark, Spark SQL, R, and Scala.

Applications of semantic information

Semantic information in data includes Power BI data categories such as address and postal code, relationships between tables, and hierarchical information.

These data categories comprise metadata that semantic link propagates into the Synapse Data Science environment to enable new experiences and maintain data lineage.

Some example applications of semantic link include:

  • Intelligent suggestions of built-in semantic functions.
  • Innovative integration for augmenting data with Power BI measures, by using add-measures.
  • Tools for data quality validation based on the relationships between tables and functional dependencies within tables.

Semantic link is a powerful tool that enables business analysts to use data effectively in a comprehensive data science environment.

Semantic link facilitates seamless collaboration between data scientists and business analysts by eliminating the need to reimplement business logic embedded in Power BI measures. This approach ensures that both parties can work efficiently and productively, maximizing the potential of their data-driven insights.

FabricDataFrame data structure

FabricDataFrame is the primary data structure that semantic link uses to propagate semantic information from semantic models into the Synapse Data Science environment.

A diagram that shows data flow from connectors to semantic models to FabricDataFrame to semantic functions.

The FabricDataFrame class:

  • Supports all pandas operations.
  • Subclasses the pandas DataFrame and adds metadata, such as semantic information and lineage.
  • Exposes semantic functions and the add-measure method that lets you use Power BI measures in data science work.