Conor vs. UNIQUE in Index Definitions
I gave a talk to the Austin-area PASS group last night on B-Tree indexing in SQL Server, and I received a question about whether to make clustered indexes UNIQUE or not (assuming the data is unique on the key columns). The expectation was that this should perform better when the index was UNIQUE.
Let’s back up a minute and see things from the perspective of the Query Processor. Someone comes in and creates an index on col1 of table1. They don’t make it UNIQUE but they put data in the column that is unique. Then you compile the query “SELECT DISTINCT col1 from Table1”. The QP generates a plan that remains valid even if other data is inserted while that plan is in the procedure cache. So, it can’t assume anything about the uniqueness of that column because you could come in and insert a value that is non-unique. If, on the other hand, you created a UNIQUE constraint/index, then things are very different! Since the generated plan is valid as long as the schema definition for the table remains the same, the QP knows that the definition of the table precludes any duplicate values from being inserted. So, it can generate a plan that doesn’t bother to do any grouping (DISTINCT is implemented using the a group by operator in SQL). Go ahead and try it – you’ll see the plan difference.
In my view, this is really more of a question about your data model – if you know something is distinct, you can get the system to enforce this rule for you and make sure that your application behaves correctly. So, when defining your tables, you should have a holistic view of how the tables relate to each other so that you can identify where you should or should not have constraints (or indexes – however you want to define them).
One case where I see customers sometimes explicitly avoiding the definition of constraints is in data warehouses. If your data is read-only and you already “know” that your data is unique, you may want to skip the space overhead or time overhead of having indexes to enforce these constraints. Now, you still have the issue with the Optimizer not being able to do fancy query rewrites based on uniqueness, but your average data warehouse query is often expensive and rarely do these kinds of rewrites apply in any meaningful way that would impact query performance (other factors dominate).
Happy Querying!
Conor
Comments
- Anonymous
November 07, 2009
Hi Conor, There's another element you forgot to mention and that's if you have non clustered indexes - their keys include the clustered index key. If the clustered index key is not unique, SQL makes it unique by adding a few extra bytes ("uniqifier") per row - so in my opinion, if you know that the columns you use in your clustered index key are unique (and that they stay unique), enforce this constraint, so that your non clustered indexes become slimmer as well. This also resulted in extra index rebuilds in older versions (when you rebuilt a non unique clustered index, all non clustered indexes were also rebuilt), but not in SQL Server 2005 +. See Paul Randal's post for more info: http://blogs.msdn.com/sqlserverstorageengine/archive/2007/06/07/what-happens-to-non-clustered-indexes-when-the-table-structure-is-changed.aspx Cheers, S. Neumann