Possible bug or issue in Synapse dedicated SQL pool when exporting parquet files

demolition porg 11 Reputation points
2023-04-06T14:05:05.62+00:00

I'm not sure if this is really a bug, but its definitly a frustration for me at least ;-). When trying to write data from Synapse dedicated SQL pools to data lake storage as parquet files (using a CETAS statement) it produces files with non-standard parquet file extensions. You would expect it to write the files with the common parquet file extension style of '.parquet' or '..snappy.parquet' but it only generates files with names ending in '.parq' which is in contrast to running the same CETAS code in Synapse serverless SQL which does generate files ending in '.parquet' I can't seem to find a way around this, apart from running REST API calls to rename the files after they are produced. IMHO I'd say that the dedicated and the serverless pools should produce consistent results when running this code. Anyone got any better ways around this or if its something that will be fixed/updated in a future dedicated SQL pool release? This is a sample piece of code (note that it produces the '.parq' or '.snappy.parq' depending on the compression used or not):-

CREATE EXTERNAL DATA SOURCE [SomeExternalDataSource] WITH (
    TYPE = HADOOP
    ,LOCATION = N'abfss://somecontainer@somestorageaccount.dfs.core.windows.net/SomeFolder'
)

CREATE EXTERNAL FILE FORMAT [CompressedParquetFormat] WITH (
    FORMAT_TYPE = PARQUET
    ,DATA_COMPRESSION = N'org.apache.hadoop.io.compress.SnappyCodec'
)

CREATE EXTERNAL TABLE [dbo].[SomeExternalTable] WITH (
    LOCATION = 'FolderOne/FolderTwo/Etc'
    ,DATA_SOURCE = [SomeExternalDataSource]
    ,FILE_FORMAT = [CompressedParquetFormat]
) AS 

SELECT 
[Field1]
,[Field2]
,[and_so_on]

FROM 
[SomeSchema].[SomeTable]
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,409 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,621 questions
{count} votes