How to write directly from Amazon s3 bucket to unity catalog volume storage?

Jeremy Lee 0 Reputation points
2024-09-03T23:55:01.0433333+00:00

Hello,

I am trying to save files from an s3 bucket into volume storage on Unity Catalog. I am using the boto3 library and when using the download file command I get the below error. I have been able to pull down two files, but I was essentially using the below code in a for loop to pull down each file into my volume storage. I'm not sure why I was able to pull down two files without fail but then get the OS error with subsequent files.

s3.download_file(bucket, file_name, '/Volumes/path/to/subdirectory/file.json')

OSError: [Errno 95] Operation not supported

File 
157
158
159
(...)
189
190
191
192
193
194
195
196
197
198
File 
401
402
403
404
405
406
407
408
409
410
411
File 
98
99
100
101
102
103
104
105
File 
263
264
265
266
267
File 
135
136
137
138
139
140
141
File 
159
160
162
163
164
165
File 
636
637
638
639
640
641
642
643
File 
382
383
384

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,918 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,211 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 11,991 Reputation points
    2024-09-04T08:32:22.68+00:00

    Hello Jeremy Lee,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having OS error while trying to download files from an Amazon S3 bucket directly into Unity Catalog volume storage.

    The error OSError: [Errno 95] Operation not supported suggests that there is an issue with the file system or the path where you're trying to save the file. You will need to look into File System Compatibility, the path provided /Volumes/path/to/subdirectory/ exists and is writable, and also file permissions.

    First, check your Databricks to ensure that your setup and configuration for Unity Catalog and storage volumes are correct and compatible with your operations. Because direct appends and random writes are not supported in FUSE v2, which is available in Databricks Runtime 6.0 and above by design. https://kb.databricks.com/dbfs/errno95-operation-not-supported

    Secondly, make sure your volume storage is correctly mounted and accessible using the sample code below to check File System and path:

    import os
    path = '/Volumes/path/to/subdirectory/'
    if not os.path.exists(path):
        print(f"Path {path} does not exist.")
    elif not os.access(path, os.W_OK):
        print(f"Path {path} is not writable.")
    else:
        print(f"Path {path} is accessible and writable.")
    

    Finally, instead of directly downloading files to the volume storage, you can download them to a temporary local directory first and then move them to the desired location.

    import boto3
    import shutil
    import os
    s3 = boto3.client('s3')
    bucket = 'your-bucket'
    file_name = 'your-file.json'
    temp_file_path = '/tmp/' + file_name
    final_file_path = '/Volumes/path/to/subdirectory/' + file_name
    # Download file to a temporary location
    s3.download_file(bucket, file_name, temp_file_path)
    # Move file to the final location
    shutil.move(temp_file_path, final_file_path)
    

    Should there be any challenge that the above could not resolve the issue as expected use the following links to see similar error and how it's been solved:

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam

    0 comments No comments

  2. Nehruji R 8,066 Reputation points Microsoft Vendor
    2024-09-04T09:53:24.8566667+00:00

    Hello Jeremy Lee,

    Greetings! Welcome to Microsoft Q&A Platform.

    The OSError: [Errno 95] Operation not supported typically indicates that the operation you’re trying to perform is not supported by the underlying filesystem or storage system. Ensure that the filesystem of your Unity Catalog volume supports the operations you’re trying to perform. Some filesystems may have limitations on certain operations.

    Similar thread for reference - https://stackoverflow.com/questions/59869276/python-boto3-download-files-from-s3-error-connection-broken-oserror

    In your scenario I would recommend using ADF (Azure Data Factory), if you are looking for a fully managed Platform-as-a-Service (PaaS) option for migrating data from AWS S3 to Azure Storage, consider Azure Data Factory (ADF), How to Copy Multiple Files from Amazon S3 to Azure Blob Storage by using ADF-Data Factory

    Based on the error message it could be related to Syntax Copy data from Amazon S3 to Azure Storage by using AzCopy

    This article describes common issues that you might encounter while using AzCopy, helps you to identify the causes of those issues, and then suggests ways to resolve them.

    Additional information: Azcopy not copying from AWS S3 to Azure Blob https://github.com/Azure/azure-storage-azcopy/issues/1618

    Does azcopy support third party vendors that sit on top of AWS.Hope this answer helps! please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.