Azure SQL Data Warehouse distribution

SQLGeek 1 Reputation point
2020-08-31T10:25:08.49+00:00

Hi,

I am working on Azure SQL Data Warehouse(Synapse) in it there is a fact table with nearly 10 billion rows spread across 40 columns. It is a 100DWU server. When I try to fetch records from this table it takes more then 30 mins to pull 9 million rows based on a filter condition.

I have been reading on performance tuning and optimization in Synapse, 1 point I came across is data distribution type of a fact table (round robin, Hash, Replicated). I am using round robin as of now as I do not have any key which have atleast 60 unique values, or a key which can distribute the data evenly in 60 distribution.

Other then this I cannot find any other resources which can help me to boost the select queries on this table. So my question is, is it necessary to increase DWU or is there any other tweak which I can do to improve the performance of simplest query on such table.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,631 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2020-09-02T00:31:42.683+00:00

    Hello @SQLGeek ,

    Thanks for posting the question and welcome to Microsoft Q & A .

    It was difficult to troubleshoot a performance issue without accessing the data , but below are few pointers .

    1. Please check things like unhealthy indexes, bad statistics, running a heavy query etc.I would recommend starting with a scale up, a rebuild of the cluster Columnstore Index on the fact table with a largeRC/static50 and a stats update with fullscan. Read more here .
    2. check the cluster Columnstore Index health of the fact table (how many compressed row groups and how big they are)?
    3. What is that we are filtering on , Is it a “simple” select * from fact_table where ColumnX = ‘value”? Or does the query have a more complex structure

    Scaled at DW100, only have a single compute resource to handle all 60 distributions in the table, translating that to a SQL environment, they are running 60 queries in parallel on a single vm. At that low a level, each of those queries is getting a very small resource allocation.

    Please let me know how it goes .

    Thanks Himanshu

    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members