Real time Data warehouse with Azure Synapse

BIFlake 1 Reputation point
2020-08-10T23:44:15.33+00:00

Hi,
I'm working on data warehouse POC for Azure and one of the key questions that I'm working on right now is the solution option where data can be refreshed (every hour, 4 hour or near real-time). Also, need to get an approximate idea of how the cost will vary based on the data refresh rate (e.g. 1 hour vs real-time).

According to MS architecture, near or real-time solutions are mostly provided using some streaming componenets (e.g. stream analytics, striim) and for non-streaming or batch processing the data is ingested as

sources > Data Factory > Data Lake > Polybase/DataBricks > Synpase > Power BI. Can this option be refreshed every hour or real time as well? What would be a more logical way to come up with cost estimates?

Thanks.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,916 questions
{count} votes

2 answers

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-08-11T09:27:27.673+00:00

    Hi @BIFlake ,

    Welcome to Microsoft Q&A Platform.

    For batch processing, Databricks notebooks can be executed from data factory and there is a provision to refresh Power BI dataset from data factory that can be the last activity. Triggers (Schedule/Event) in Azure Data Factory gives a provision to run the pipelines as per the schedule/event. Currently, 1minute is the least schedule interval for trigger and the cost can be seen here.

    Azure pricing calculator lets you configure and estimate costs for Azure products. It also includes example scenarios as shown below that has real-time analytics components and estimates can be seen for the entire set at a time. By default it shows monthly estimates, there is provision to edit all the configuration properties of the resource and also cost can be estimated at hourly basis. Some resources like Azure Synapse Serverless billing depends on data processed and hence cost can be estimated using data processed parameters.

    Hope this helps! Please let us know for further queries and we will be glad to assist. 17012-estimate-examples.png

    1 person found this answer helpful.

  2. HarithaMaddi-MSFT 10,136 Reputation points
    2020-08-14T10:48:31.04+00:00

    Hi @BIFlake ,

    Thanks for posting the query. Power BI can connect to databricks to read data and as shown in above diagram, databricks interacts with Synapse to store and transform the data.

    Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

    Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs

    As per my understanding, Databricks is a managed Apache Spark, whereas Synapse Analytics is managed SQL Data Warehouse. If business requirement involves reporting that would need using a data warehouse(Single storage for multiple sources of data, performance etc.,), Synapse Analytics would be needed for real time data processing as well.

    Semantic and Dimensional model can be created in Power BI that lets you define roles, relationships using in-built storage. For requirements involving critical response times in Power BI, Azure Analysis Services is preferred that lets you build complex aggregations faster with powerful DAX support that gives more flexibility to the reporting.

    Hope this clarifies! Please let us know for further queries and we will be glad to assist.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.