Azure Data Lake Gen2 - Use Case Advice

Eric Maibach 1 Reputation point
2020-10-30T14:58:46.347+00:00

I am collecting weather data (history and forecast) from a third part web service. Since there will be a lot of data, and it will not have high use, I was planning to use Azure Data Lake Gen2 with blob storage, and storing the data in JSON files. My thought is that this will be cheaper than a Azure SQL database.

I have read that it is best to have larger files in Data Lake. The amount of data that is collected each hour is relatively small, so I was thinking of just having a file for each month. But this means when I collect data each hour I need to add to the current months file. What is the best way to do this? Should I read the file, add my new data to the data from the file, and then overwrite the file with the new data? That seems the easiest, but seems inefficient. Is there a better way to do this, so way to append? Or should I just live with have smaller files and create a new file each hour?

And, is this even an appropriate use case for Data Lake?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,466 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.