Hello Jörg Lang,
Switching to Azure Data Lake Storage Gen2 and using Parquet files can indeed offer benefits in terms of processing speed and storage optimization. Parquet is a columnar storage file format that is optimized for use with Azure Data Lake and is known for its efficiency and performance, especially with large datasets.
To get the max(last_modified) from a Parquet file, you can use a tool like Apache Spark or Azure Databricks to read the Parquet file and perform the necessary aggregation. Both of these tools have built-in support for reading and processing Parquet files.
Regarding creating new Parquet files for each run, it's a common practice to create new Parquet files for each data ingestion run, especially when dealing with delta datasets. This helps in managing data versions and ensures that you have a historical record of data changes.
When you create a new Parquet file for each run, you can maintain a metadata file or a manifest that tracks the last_modified timestamps for each file. This way, you can easily retrieve the most recent last_modified value across all your Parquet files.
I hope this helps. Please let me know if you have any further questions.