Self-hosted machine unable to access Data Lake Storage account when running a pipeline using Synapse

Xhevahir Mehalla 140 Reputation points
2024-06-07T08:58:57.6633333+00:00

Hello -

I need your help again:

Here's the story:

  1. I have azure synapse workspace
  2. I have created managed private end point created for ADLS - working fine
  3. I have create private endpoint created for ADLS - working fine
  4. ADLS have set the public access =OFF - we do not want to leave this open (Disabled).
  5. Have a VM used as self hosted to be used as integration run time. The VM is on the same vnet as ADLS account
  6. I created linked service on oracle data (used in OCI) - VPN S2S is set. all is working
  7. The VM has system assigned ID and we assigned the Storage Contributor Role to ADLS account
  8. I created a synapse pipeline to copy data from Oracle to ADLS - (source Ok, sync ok) but the Pipelines fails with error 24200 "Operation on target TestExtractOracleDataFromOCI failed: ErrorCode=AdlsGen2ForbiddenError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 failed for forbidden: Storage operation 'CreateFile' on container 'staging' and path 'FCUSER.ACTB_ACCBAL_HISTORY.csv' get failed with 'Operation returned an invalid status code 'Forbidden''. Possible root causes: (1). It's possible because the IP address of the self-hosted integration runtime machines are not allowed by your Azure Storage firewall settings. (2). If the self-hosted integration runtime use proxy server, it's possible because the IP address of the proxy server is not allowed by your Azure Storage firewall settings.. Account: 'stdatalakeembdev'. FileSystem: 'staging'. Path: 'FCUSER.ACTB_ACCBAL_HISTORY.csv'. ErrorCode: 'AuthorizationFailure'. Message: 'This request is not authorized to perform this operation.'. RequestId: '825f1d70-201f-003b-43b6-b8af0a000000'. TimeStamp: 'Fri, 07 Jun 2024 08:39:10 GMT'..,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Operation returned an invalid status code 'Forbidden',Source=,''Type=Microsoft.Azure.Storage.Data.Models.ErrorSchemaException,Message=Operation returned an invalid status code 'Forbidden',Source=Microsoft.DataTransfer.ClientLibrary,"
  9. when I set the adls to enable publci then it works; when I set the vnet, subnet it works as well , but I want to disable completely and I want the private endpoint to take care of this.

Please can you help!

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,408 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,612 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 18,341 Reputation points
    2024-06-08T14:26:42.6633333+00:00

    It can be a problem with your network, so you can verify that your VM and Data Lake Storage private endpoint are in the same VNet and that the NSG rules on the subnet where your VM and private endpoint reside allow communication between them.

    Ensure that no user-defined routes might block access between your VM and the Data Lake Storage account.

    Also, verify that the managed private endpoint is correctly set up for the Azure Synapse workspace and is in an approved state.

    Ensure the firewall settings of your ADLS account allow access from the VNet/Subnet where your VM is located.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Smaran Thoomu 12,090 Reputation points Microsoft Vendor
    2024-06-10T04:56:51.66+00:00

    Hi @Xhevahir Mehalla

    Thanks for the question and using MS Q&A platform.

    Based on the error message you provided, it seems that the self-hosted integration runtime machine's IP address is not allowed by your Azure Storage firewall settings. This is why the operation on the target TestExtractOracleDataFromOCI failed with the error code AdlsGen2ForbiddenError.

    To resolve this issue, you need to add the IP address of the self-hosted integration runtime machine to the Azure Storage firewall settings. You can do this by following these steps:

    1. Go to your Azure Storage account and select the "Security+Networking" and select the "Firewalls and virtual networks" option from the left-hand menu.
      User's image
    2. Under the "Firewalls and virtual networks" section, add the IP address of the self-hosted integration runtime machine to the allowed IP addresses list.
    3. Save the changes and try running the pipeline again.

    If you are still facing issues, you can also check if the self-hosted integration runtime machine is using a proxy server. If it is, then you need to add the IP address of the proxy server to the Azure Storage firewall settings as well.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments