Azure Synaps serverless scan file that should be filtered

Shlomi Lanton 56 Reputation points
2024-07-01T08:48:06.6533333+00:00

Hey,

We have a large amount of files stored in Storage Account in Parquet format with the following hierarchy.
/m_id=%M_ID%/quarter=%QUARTER%/month=%MONTH%/partition_version=last/part-0.parquet'.
examples for quarter values: 2024Q1, 2024Q2, 2024Q3 etc.

We have a view we use the query the data, the view is defined using OPENROWSET , when running a query a WHERE filter and [quarter] in ('2023Q2', '2023Q3', '2023Q4', '2024Q1') we get an error

Error handling external file: 'IO request completed with an error. ERROR = 0x0000000C'. Underlying data description: file '/machine_id=aaaa/quarter=2024Q2/month=6/partition_version=last/part-0.parquet'.

but note that the quarter of 2024Q2 is not in the query, so we expect this file to not be scanned at all.

  1. How can we dig down to this issue?
  2. can we get a list of all files scanned in a query?
  3. We apply multiple filters in the query (on partitions and values), does the order of elements in the query matter?

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,613 questions
{count} votes