Real-Time Operational Analytics: Filtered nonclustered columnstore index (NCCI)

The previous blog https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/04/real-time-operational-analytics-dml-operations-and-nonclustered-columnstore-index-ncci-in-sql-server-2016/ showed how updateable nonclustered columnstore index (NCCI) processes DML operations (i.e. Insert, Update and Delete) and why NCCI is more expensive to maintain than a traditional rowstore btree index. SQL Server 2016 provides following two options to reduce the impact of NCCI on the transactional workload.

  • Filtered nonclustered columnstore index (NCCI)
  • Option to delay compression in columnstore index

In this blog, I will focus on filtered nonclustered columnstore index.

Filtered index is not a new option. It was first introduced in SQL Server 2008 and now with SQL Server 2016, it is supported on NCCI. The picture below shows a NCCI with and without filtered predicate.

ncci-fil-unfiltered

The picture on the left shows an NCCI without filtered predicate and the one on the right shows an NCCI with filtered condition. You will notice portion of disk-based table marked ‘hot’ which represents the data on which there is no NCCI. In other words, any changes made by transactional workloads in ‘hot’ part, there is no overhead of maintaining the NCCI. However, when you run an analytics query that uses NCCI, the SQL Server optimizer will combine the ‘hot’ data automatically requiring no changes to the query. Since there is no NCCI on the ‘hot’ data, the analytics query will be slower but still it will provide real-time analytics. In a nutshell, the filtered NCCI trades off analytics query performance to minimize the impact on transactional workload. You may wonder how SQL Server can identify the ‘hot’ rows (i.e. the ones that don’t qualify with the filtered condition) efficiently without requiring a full table scan?  The trick is to have a regular btree clustered index (CI) on the columns used for the filtering condition. You could use a regular NCI to do the same but it will be very inefficient and therefore NOT recommended.  

Let us take an example to illustrate these points. The schema and the data load is identical to what I had used in the example in the blog  https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/04/real-time-operational-analytics-dml-operations-and-nonclustered-columnstore-index-ncci-in-sql-server-2016/ except for the NCCI which uses a filtered condition ‘orderstatus = 5’. The example is related to Order Processing application where rows are marked ‘cold’ once customer has received the shipment, shown by the condition (orderstatus = 5).  Note, there is also a clustered btree index with key column ‘orderstatus’. CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci ON orders (accountkey, customername, purchaseprice, orderstatus) where orderstatus = 5
Just like before, I load 3 million rows and create the NCCI. What you see that even though we had inserted 3 million rows, the total rows in NCCI are 2849881. The remaining rows (i.e. the ‘hot’ rows) don’t meet the filtered predicate.

filtered-ncci-DMV

Once the data has been loaded, now let us run one query to show that the generated query plan picks both the rows in NCCI and also the ones that don’t meet the filtered condition automatically using clustered btree index. The numbers of rows flowing thru btree clustered index are (3000000-2849881) = 150119 filtered-ncci-query-plan

In summary, the filtered NCCI allows you to minimize impact on the transactional workload while still providing you performant analytics queries in real-time. The filtered NCCI works best if the clustered btree index naturally aligns to the filtered predicate condition. If this does not work for your application, then the next choice is to use compression delay option with NCCI. I will describe that in a future blog.

Thanks

Sunil

Comments

  • Anonymous
    March 05, 2016
    The comment has been removed
    • Anonymous
      March 05, 2016
      No. Please note that you can only create a filtered NCI which is not efficient to find rows in a range
      • Anonymous
        March 06, 2016
        Thank you, Sunil for your quick information.Best Regards,Yoshihiro Kawabata
    • Anonymous
      July 17, 2016
      The sample actually is btree Clustered index and NCCI. I don't see any benefit of creating additional filtered btree index
  • Anonymous
    April 06, 2016
    what about having NCCI indexes in a secondary replica only on an alwayson configuration?if I want to use 1 replica for reporting and the primary for OLTP, I want to have the index in place on the replica only.
    • Anonymous
      April 07, 2016
      At this time, the physical structures in the database on the secondary must match the structures on the primary, so in order to use NCCI on a secondary, you need to create it on the primary. Note, however, that we did a lot of work to minimize the maintenance cost of the NCCI, plus you can create a filtered NCCI to further reduce the impact.