How to read/write data from Azure filestorage/fileshare into Python script

Question

I'm trying to read/write joblib and csv files from Azure File Storage/File Share into a Python script. To check its working, I'll read/write from vs code running locally, and then finally run from a container in ACI. The code below works, but the file imported is a StorageStreamDownloader type, can this be converted into a Python object like joblib (for a joblib file) and Pandas DataFrame (for a csv file).

from azure.storage.fileshare import ShareFileClient
import pandas as pd

connection_string=f'DefaultEndpointsProtocol=https;AccountName=XXXX;AccountKey={key};EndpointSuffix=core.windows.net'
filename='test.csv'


file_client = ShareFileClient.from_connection_string(conn_str=connection_string, share_name="XXX", file_path=filename)
with open("DEST_FILE", "wb") as file_handle:
    data = file_client.download_file()
    data.readinto(file_handle)

df=pd.DataFrame(data)
ValueError

Answer

Hi ,

Thanks for reaching out to Microsoft Q&A.

You can convert the 'StorageStreamDownloader' object into a 'joblib' object or a Pandas df. For this, you need to read the data from the stream and then process it accordingly.

Try the below modified script locally in vscode to check if it works, and then use it in your container in ACI.

connection_string = 

def download_file_to_memory(connection_string, share_name, filename):
    file_client = ShareFileClient.from_connection_string(conn_str=connection_string, share_name=share_name, file_path=filename)
    downloader = file_client.download_file()
    downloaded_bytes = downloader.readall()
    return downloaded_bytes

# Example for CSV file
csv_filename = 'test.csv'
csv_data = download_file_to_memory(connection_string, share_name, csv_filename)

# Read CSV data into a Pandas DataFrame
csv_io = io.BytesIO(csv_data)
df = pd.read_csv(csv_io)
print(df)
# Example for joblib file
joblib_filename = 'model.joblib'
joblib_data = download_file_to_memory(connection_string, share_name, joblib_filename)

# Load joblib data into a Python object
joblib_io = io.BytesIO(joblib_data)
model = joblib.load(joblib_io)
print(model)

Note: the above is purely an example, you will have to modify/add codeblocks or libraries the script to suit your needs.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Answer

Hi HughA,

Thanks for reaching out to Microsoft Q&A.

You can convert the 'StorageStreamDownloader' object into a 'joblib' object or a Pandas df. For this, you need to read the data from the stream and then process it accordingly.

Try the below modified script locally in vscode to check if it works, and then use it in your container in ACI.

connection_string = 

def download_file_to_memory(connection_string, share_name, filename):
    file_client = ShareFileClient.from_connection_string(conn_str=connection_string, share_name=share_name, file_path=filename)
    downloader = file_client.download_file()
    downloaded_bytes = downloader.readall()
    return downloaded_bytes

# Example for CSV file
csv_filename = 'test.csv'
csv_data = download_file_to_memory(connection_string, share_name, csv_filename)

# Read CSV data into a Pandas DataFrame
csv_io = io.BytesIO(csv_data)
df = pd.read_csv(csv_io)
print(df)
# Example for joblib file
joblib_filename = 'model.joblib'
joblib_data = download_file_to_memory(connection_string, share_name, joblib_filename)

# Load joblib data into a Python object
joblib_io = io.BytesIO(joblib_data)
model = joblib.load(joblib_io)
print(model)

Note: the above is purely an example, you will have to modify/add codeblocks or libraries the script to suit your needs.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Answer

Hello HughA,

Greetings! Welcome to Microsoft Q&A Platform.

Yes, you can convert the StorageStreamDownloader object into a Python object like a joblib file or a Pandas DataFrame. You can use the joblib library to load the file from the StorageStreamDownloader object with sample script,

Initially download the Blob as stream,

import joblib
from io import BytesIO
# Read the stream into a BytesIO object
stream = BytesIO()
streamdownloader.readinto(stream)
stream.seek(0)  # Reset the stream position to the beginning
# Save the stream to a joblib file
joblib.dump(stream, 'your_file_name.joblib')

Try use the pandas library to read the CSV file from the StorageStreamDownloader object.

Initially download the blob as stream

from azure.storage.blob import BlobServiceClient
from io import StringIO
import pandas as pd

# Initialize the BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string("your_connection_string")
container_client = blob_service_client.get_container_client("your_container_name")
blob_client = container_client.get_blob_client("your_blob_name")

# Download the blob as a StorageStreamDownloader object streamdownloader = blob_client.download_blob()

converting into desired format,

# Read the stream into a pandas DataFrame
downloaded_blob = streamdownloader.readall()
df = pd.read_csv(StringIO(downloaded_blob.decode('utf-8')))

Note: Please modify the code as per your requirement as these are sample code.

refer - https://video2.skills-academy.com/en-us/python/api/azure-storage-blob/azure.storage.blob.storagestreamdownloader?view=azure-python

Similar thread for reference - https://stackoverflow.com/questions/33091830/how-best-to-convert-from-azure-blob-csv-format-to-pandas-dataframe-while-running.

Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.

Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Share via

How to read/write data from Azure filestorage/fileshare into Python script

3 answers