Incremental Statistics – My New Favorite Thing
So many new features get added with each release of SQL it is hard to use them all. Sometimes, I often don't realize how helpful a new feature is until I see it in action. Incremental Stats was introduced in SQL 2014, but I recently had the opportunity to help a customer implement it and it has been a huge win! It is my new favorite SQL thing for now… I will fall in love with something else here soon.
The scenario: A data mart with all the right best practices used. A solid data model, all fact tables are partitioned, data gets loaded into a partition aligned stage table, indexes built on the stage table, partition switching is used to instantly swap the prepared data into the fact table, purging of old data is done via partition switching as well. Once a partition is swapped into the fact table, the data never changes.
The problem is that after the partition is swapped into the fact table, there must be a stats update for the whole table to reflect the data in the new partition. None of the other data changed… why should stats need to be updated for the whole table? Here is where incremental stats comes in. When you run an update stats command with INCREMENTAL=ON, stats are generated at the partition level, and then each partition of stats updates the overall stats for the table. After this is done once for the table, you can then update stats for a particular partition or set of partitions. This is a huge time saver! Note that you must also use RESAMPLE, therefore the sample rate that was used for the initial table level UPDATE STATISTICS will need to be used for all partition level stats updates.
This simple change shaved a ton of time off daily load processes.
Here is a small example to illustrate my point:
I created a relatively small partitioned table and loaded some records into 4 of the partitions
Showing the trace output below, you can see that the full table stats update used to turn on incremental stats took 8.5 seconds. Just updating the stats on partition 6 took 1.2 seconds. Updating stats for partitions 5 and 6 took a little less than half the time of the whole table, which lines up with the fact that it is a little less than half the data in the table. Note that I did full scan just to have a longer run time with this small set of data.
When you are dealing with a large amount of data and many more partitions, the time savings can be substantial.