ADF - How to identify the number of partition needed, while using SAP source?

Santhosh Kumar 6

Hi Team,
My source is SAP and my table is having billions of records, i'm using CopyData activity with Partition option as "On Int" to retrieve the data faster. I'm passing "20000" as "Max partitions number", the problem is, if the data is available for only 10000 partitions, then for the remaining 10000 partitions, it's creating empty files, which is making more difficult when running other jobs.

How to identify, what is the maximum number of partitions needed upfront for an SAP table data in ADF?
Can ADF restrict itself in creating the empty files, if the data is available for only few partitions?

Thanks!

MartinJaffer-MSFT 26,081 Reputation points

2020-10-15T19:56:04.077+00:00

Hello @Santhosh Kumar and welcome to Microsoft Q&A. I would like to address your two questions separately if possible.

My understanding was, when partitioning, you choose a column to partition on. The number of unique values between the partitionUpperBound and the partitionLowerBound is what it tries to use for how many partitions. If the number of unique values is greater than "Max partitions number", then it distributes the remainder among existing partitions.

I think you ran into the opposite problem, you have fewer unique values than the "Max partitions number".

In SAP partitioning documentation, the max partitions per table is 16000. I'm not 100% sure if this applies to interactions with ADF.

I will need to ask internally what the intended behavior is for when actual partitions used < Max partitions.
Santhosh Kumar 6 Reputation points

2020-10-16T11:32:58.673+00:00

Hi @MartinJaffer-MSFT ,

Let me provide an example, for SAP table with Date column then, if I pass the below values,

LowerBound: 19500101
UpperBound: 20210101
Max. partitions number: 1000

ADF is just creating 1000 partition files based on the Lower and Upper bound range.
For e.g.,
partition_19500101_19500630_000.txt
partition_19500630_19501231_000.txt and so on..

Because of this, if 80% of the data is falling under 20000101 - 20040101 then, the other partition files are just created with empty data(only header exists).

I felt, ADF should not create empty files, if the data is not available for a partition range. And, is there a way to identify what is the maximum partition number needed for extracting a SAP table data upfront? so that, I can pass that number for partitioning.

Thanks
Santhosh Kumar 6 Reputation points

2020-10-21T06:29:11.76+00:00

Hello @MartinJaffer-MSFT ,

Good Day!

Any update for my above query?

Thanks
MartinJaffer-MSFT 26,081 Reputation points

2020-10-26T21:19:41.547+00:00

No, sorry, I'll ping them harder until I get a response now.

1 answer

MartinJaffer-MSFT 26,081 Reputation points

2020-10-26T22:59:45.3+00:00
@Santhosh Kumar I got a response.

The SAP table currently does not know most of the data falls into 2000-2004. Since you know this is the case, you should split the current copy activity into separate 'runs'. For the current one, restrict the upper bound to 2004 and lower bound to 2001 where most of the data is, and specify the number of partitions.

For the range 1950-2000 put as another run without the partition number set, because there is not much data.

Similarly for 2004-2021, do not set the partition number, as there is not as much data here either.

While this strategy would improve the performance of your copy, I feel it does not address your original ask.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

ADF - How to identify the number of partition needed, while using SAP source?

1 answer

Your answer