Azure Synapse CETAS is not working

Amila Akalanka 0 Reputation points
2024-07-08T01:33:11.29+00:00

Hi,

When creating an external table in Synapse SQL Pools (SSP) using the following query, I encountered an error:

"Operation failed since the external data source 'https://<storageacc>.dfs.core.windows.net/<containername>/asa' has underlying storage account that is not in the list of Outbound Firewall Rules on the server. Please add this storage account to the list of Outbound Firewall Rules on your server and retry the operation."

Also, reading data from the same source using a Spark notebook takes 20-30 minutes and then fails.

%%pyspark
df = spark.read.load('abfss://<container>@<storageaccount>.dfs.core.windows.net/test.parquet', format='parquet')
display(df.limit(10))

What I'm doing wrong here?

CREATE MASTER KEY ENCRYPTION BY PASSWORD='*******';
GO

CREATE DATABASE SCOPED CREDENTIAL [WorkspaceIdentity] WITH IDENTITY = 'Managed Identity';
GO


CREATE EXTERNAL DATA SOURCE adlsDataSourcenew
WITH ( LOCATION = 'https://<storageaccName>.dfs.core.windows.net/<containerName>/',
CREDENTIAL = WorkspaceIdentity)
GO

CREATE EXTERNAL FILE FORMAT parquet_file_format
WITH
(  
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
)
GO

CREATE EXTERNAL TABLE iris_external_table_new
WITH (
    LOCATION = '/test',
    DATA_SOURCE = adlsDataSourcenew,  
    FILE_FORMAT = parquet_file_format
)
AS
SELECT
    TOP 100 *
FROM dbo.[iris_parquet]
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,917 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 10,325 Reputation points Microsoft Vendor
    2024-07-08T09:24:29.7033333+00:00

    @Amila Akalanka

    Thanks for using MS Q&A platform and posting your query.

    The errors you're encountering in both the SQL and Python code point towards access issues with your Azure Data Lake Storage (ADLS) Gen 2 account. Here's a breakdown of the problems and how to fix them:

    Error Analysis:

    Outbound Firewall Rules: The error message in the SQL code indicates that the Synapse SQL Pool doesn't have outbound access to the storage account you specified. This is because the storage account isn't included in the list of allowed outbound firewall rules for the server.

    Slow and Failing Spark Job: The slow performance and eventual failure in the Spark notebook might be due to the same access issue or potential issues with data size or processing within Spark itself.

    Solutions:

    Configure Outbound Firewall Rule:

    • Access the Azure portal and navigate to your Synapse Workspace.
    • Go to Workspace settings > Security.
    • Under Outbound firewall, click Add rule.
    • Configure a new rule with the following details:
    • Name: Give it a descriptive name like "ADLS Gen 2 Access"
    • Priority: Choose a high priority
    • Destination Port Range: Leave blank
    • Source: Select the IP address range for Azure Synapse Analytics (you can find this range in the documentation https://video2.skills-academy.com/en-us/azure/synapse-analytics/security/gateway-ip-addresses)
    • Destination: Enter the public endpoint of your ADLS Gen 2 storage account (e.g., <storageaccount>.dfs.core.windows.net).
    • Save the rule.

    Spark Notebook Issue:

    • The slowness and eventual failure when reading data using Spark might be related to the firewall issue or the way you're defining the location in the Spark code. Here's what to check:
    • Firewall: Ensure the firewall allows access as mentioned above.
    • Location: In the Spark code (abfss://<container>@<storageaccount>.dfs.core.windows.net/test.parquet), you're using abfss instead of abfs. Update the location string to abfs://<container>@<storageaccount>.dfs.core.windows.net/test.parquet.

    SQL Script Issues:

    • The provided SQL script seems to be unrelated to the current problem. It defines a master key and a database scoped credential, which might be used for other purposes but aren't necessary for creating the external table.

    Here's how to fix the external table creation:

    1. Double-check the location path in your CREATE EXTERNAL TABLE statement. Ensure the folder "/test" exists in your ADLS Gen2 storage account container.
    2. Verify that the WorkspaceIdentity credential has the necessary permissions to access the storage account.

    By resolving the firewall restrictions and potentially fixing the location path, you should be able to create the external table successfully and read data from your ADLS Gen2 storage using both methods.

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.