How to set up ADF Python script to access an external SFTP with Firewall Exceptions (whitelist)?

Question

I am working on an ADF pipeline. One of the steps will include a Python script that connects to an external SFTP, download some files and upload them to my Storage Account.

The SFTP owner asked to share the IP that he should add to Firewall exceptions so the code can connect with the SFTP.

My current setup:

Virtual Network
Public IP Address with static IP and DNS Name Label
A Pool in the Azure Batch Service. In the Network Configuration step I used my Vnet from 1 and default Subnet, usermanaged IP address provisioning type and assigned PublicIP ID.
I can now see that the IP of Node that was created is the same as the one from 2.

Shall it work? I tried to prepare a POC of this solution that follows this path with copying a file from one container of Storage Account to Another.

It works with the Public Access in the Networking blade enabled
It works with the Enabled from selected virtual networks and IP addresses selected and my Virtual Network Added
It works with the Enabled from selected virtual networks and IP addresses selected and my local IP added to the whitelist when I run it locally
It DOES NOT WORK with the Enabled from selected virtual networks and IP addresses selected and The Node Public IP (from 2 / 4) added to the whitelist. What might be the reason for that and how to cope with it?

The code is pretty simple:

from azure.storage.blob import BlobClient
import pandas as pd
from io import BytesIO
import requests

# Print current IP address
response = requests.get('https://api.ipify.org?format=json')
ip_address = response.json()['ip']
print(f'Current IP Address: {ip_address}')

# Define parameters
connectionString = "connectionstring"
inputContainerName = "input"
inputBlobName = "iris.csv"
outputContainerName = "output"
outputBlobName = "iris_setosa.csv"

# Establish connection with the blob storage account for input container
input_blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=inputContainerName, blob_name=inputBlobName)

# Download the blob as a stream
input_stream = input_blob.download_blob()
df = pd.read_csv(BytesIO(input_stream.readall()))

# Take a subset of the records
df = df[df['Species'] == "setosa"]

# Save the subset of the iris dataframe locally in memory
output_stream = BytesIO()
df.to_csv(output_stream, index=False)
output_stream.seek(0)  # Reset the stream position to the beginning

# Establish connection with the blob storage account for output container
output_blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=outputContainerName, blob_name=outputBlobName)

# Upload the stream to the output container
output_blob.upload_blob(output_stream, overwrite=True)

Answer

You can access SFTP as a source in ADF and copy data into blob storage via Copy activity.

There are 2 ways to whitelist the IP in SFTP :

In case if you need a single static IP, you can host a self hosted IR and whitelist that servers public IP within the SFTP
Every ADF region has a specific range of IP address which you can whitelist

Similar thread :

https://video2.skills-academy.com/en-us/answers/questions/1513454/how-to-generate-a-static-ip-for-adf-to-connect-to

Share via

How to set up ADF Python script to access an external SFTP with Firewall Exceptions (whitelist)?

1 answer