ADF - How to identify the number of partition needed, while using SAP source?

Santhosh Kumar 6 Reputation points
2020-10-14T11:28:03.437+00:00

Hi Team,
My source is SAP and my table is having billions of records, i'm using CopyData activity with Partition option as "On Int" to retrieve the data faster. I'm passing "20000" as "Max partitions number", the problem is, if the data is available for only 10000 partitions, then for the remaining 10000 partitions, it's creating empty files, which is making more difficult when running other jobs.

  1. How to identify, what is the maximum number of partitions needed upfront for an SAP table data in ADF?
  2. Can ADF restrict itself in creating the empty files, if the data is available for only few partitions?

Thanks!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,568 questions
{count} vote

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,081 Reputation points
    2020-10-26T22:59:45.3+00:00

    @Santhosh Kumar I got a response.

    • The SAP table currently does not know most of the data falls into 2000-2004. Since you know this is the case, you should split the current copy activity into separate 'runs'. For the current one, restrict the upper bound to 2004 and lower bound to 2001 where most of the data is, and specify the number of partitions.
    • For the range 1950-2000 put as another run without the partition number set, because there is not much data.
    • Similarly for 2004-2021, do not set the partition number, as there is not as much data here either.

    While this strategy would improve the performance of your copy, I feel it does not address your original ask.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.