Real-Time Operational Analytics: Compression Delay option with NCCI and the performance
The previous blog https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/06/real-time-operational-analytics-compression-delay-option-for-nonclustered-columnstore-index-ncci/ showed the scenario and benefits of compression-delay option for NCCI. In this blog, I describe an experiment on a transactional workload representing Order Management to measure the effectiveness of compression delay.
Workload
It is an order management application. New orders are inserted and they go through multiple updates over the next 45 minutes and then they becomes dormant. We ran this workload in two distinct phases. In phase-1, the concurrent transaction workload is run for a fixed duration creating/processing new orders. At the end of phase-1, we measure how many orders were processed. In phase-2, we run a fixed number of concurrent analytics queries in a loop and measure how long it took to complete them. This experiment was run both with compression delay (a) 0 and (b) 45 minutes. Here are the results
Compression Delay = 0 minutes (default)
- Total numbers of orders processed = 466 million
- Total on-disk storage for NCCI = 13.8 GB
- Total time taken to run the fixed set of analytics queries = 03:06:09
Compression Delay = 45 minutes
- Total numbers of orders processed = 541 million
- Total on-disk storage for NCCI = 9.9 GB
- Total time taken to run the fixed set of analytics queries = 02:19:53
What it shows that with compression delay, we speed up transactional workload approximately 15%, reduced storage footprint by 30% and improved the performance of analytic queries by 20%+. All this was done by just setting the compression delay without requiring any changes to the application. Note, the compression delay is just an option on the index. You can change it anytime and it does not require index rebuild.
Note, compression delay option is supported on all forms of columnstore indexes.
Thanks
Sunil
Comments
- Anonymous
April 14, 2016
Shouldn't the 45 minutes delay be slower for analytic queries since less data is now sitting in NCCI ?- Anonymous
May 05, 2016
There is a trade-off between amount of data in the NCCI and quality of the index. If rows are typically updated once in the first 45 minutes after inserting, then within those 45 minutes many of the rows in the compressed rowgroup will be marked as deleted. This leads to wasted space and decrease in query performance, since SQL still needs to scan the entire rowgroup, even if half of the rows in it are ignored, since they were deleted. And the more updates happen, the worse the quality of the compressed rowgroups, and eventually a lot of maintenance will need to be done (in the form of REORGANIZE or REBUILD).So for scenarios where rows a frequently updated in a certain period after insert, compression_delay really helps to keep up the quality of the index. And if you're thinking about a period of 45 minutes, still the vast majority of data is going to be older and thus in compressed rowgroups.For scenarios where updates/delete are infrequent, just use compression_delay=0.For scenarios where a predicate can be used as an indicator whether it is likely that a row is going to be updated (e.g., OrderStatus='shipped'), use a filtered NCCI.
- Anonymous