HI - I need some design recommendations for my requirement. I have list of parquet files stored in Data lake. I need to execute some SQL statements against them , ex. select max(amount), count() and return the results so that i can write back them to my On premise SQL server. Please note that I am looking for a solution without storing the data in intermittently in Dedicated SQL Pool using Copy or external tables.
- Using OpenRowset() function in SQL Serverless pool only help to analyse the data but I can't return the resultset and store in my On premise server using data pipelines. Please correct me if my understanding is not right here.
- Data pipelines allows to transfer the data from ADL to other stores but it doesn't have capability to query and return results so that i can sink it to On Premise SQL server using copy activity. However, it looks like without storing to SQL Pool we can't achieve it either .
One more option i am considering is, Can I pass the parquet file name to spark note book and does it have capability to run the queries and return the results so that I will store the results to On-premise server. I will use Data pipeline copy activity to automate the workflow. Is it possible to implement the way I am thinking, please confirm.
Please let me know other possible options to achieve my objective.
thanks