Import filenames from Azure Storage container to SQL Table using Azure Data Factory

PS 396 Reputation points
2024-06-27T01:33:27.9933333+00:00

Hi All,

I am trying to import filenames from an Azure Storage container into a table in a Dedicated SQL Pool (formerly SQL DW) using Azure Data Factory.

For example, I have around 2,000 JSON files in an Azure Storage container, and I would like to import all the filenames into a table. I want to avoid using a foreach loop, as the file count might increase, making the process take too long.

I have tried from Get metedata-->Stored Procedure as suggested in this post ("https://stackoverflow.com/questions/64227251/azure-data-factory-get-metadata-to-get-blob-filenames-and-transfer-them-to-azure") but the filenames length is going over the max allowed (4000) characters for NVARCHAR(MAX) in DedicatedSQLPool.

sample metadata output.

{
	"childItems": [
		{
			"name": "sample_006c0718-a1d6-d56a-1a5d-b550a023bb63.json",
			"type": "File"
		},
		{
			"name": "sample_008f8110-b60e-f7c2-03db-85909b729aa6.json",
			"type": "File"
		},
		{
			"name": "sample_00e1d0fd-7e8d-3abb-5fb7-5e598543b756.json",
			"type": "File"
		}
	],
	"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Central US)",
	"executionDuration": 1,
	"durationInQueue": {
		"integrationRuntimeQueue": 0
	},
	"billingReference": {
		"activityType": "PipelineActivity",
		"billableDuration": [
			{
				"meterType": "AzureIR",
				"duration": 0.016666666666666666,
				"unit": "Hours"
			}
		]
	}
}

Can someone suggest a way to achieve this?

Thank you!

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,854 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,570 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,599 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,976 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Nehruji R 3,971 Reputation points Microsoft Vendor
    2024-06-28T08:17:33.12+00:00

    Hello PS ,

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you would like to import filenames from an Azure Storage container into a table in a Dedicated SQL Pool (formerly SQL DW) using Azure Data Factory and it seems you’re encountering an issue with the Get Metadata activity in Azure Data Factory. Please consider the following to resolve the issue.Yes correct. The Get Metadata activity has a limitation, it cannot return results larger than 4 MB per article. When dealing with large numbers of filenames, especially in scenarios like yours, this limitation can cause problems. Some possible work around for this limitation, consider the following approaches:

    1.One possible approach is to modify source files to load into separate folders each with < 5000 files and they can be accessed by Get Metadata activities separately. Since this requires change from source, another possible approach is to use "Azure Functions" to get the list of files information and then pass it to Foreach or entirely azure function can be used to implement entire requirement. reference thread.

    2.Try using a Data Flow for Iteration: If your files are non-binary (e.g., CSV), create a file with a bunch of 5000 filenames. Then, use a data flow to iterate over these filenames. reference thread.

    3.String Manipulation: If you need to extract the filenames, you can store the output of the Get Metadata child items in a variable as a string and manipulate it using string functions. reference thread.

    Use get metadata activity to get filenames and pass the output of get metadata activity child items as input to a for each activity. Within foreach activity use a copy activity with source as blob with filename as item() and sink as SQL with table as item().

    refer - https://video2.skills-academy.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments