Maintaining Unique Indexes

Consider the following schema:

CREATE TABLE T (PK INT PRIMARY KEY, A INT, B INT)
CREATE INDEX TA ON T(A)
CREATE UNIQUE INDEX TB ON T(B)

INSERT T VALUES (0, 0, 0)
INSERT T VALUES (1, 1, 1)

Now suppose we run the following update statement:

UPDATE T SET A = 1 - A

This update statement affects the clustered index and the non-clustered index TA.  The plan is pretty much what you might expect:

  |--Clustered Index Update(OBJECT:([T].[PK__T__15502E78]), OBJECT:([T].[TA]), SET:([T].[A] = [Expr1003]))
       |--Compute Scalar(DEFINE:([Expr1016]=[Expr1016]))
            |--Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [Expr1004] THEN (1) ELSE (0) END))
                 |--Compute Scalar(DEFINE:([Expr1003]=(1)-[T].[A], [Expr1004]=CASE WHEN [T].[A] = ((1)-[T].[A]) THEN (1) ELSE (0) END))
                      |--Top(ROWCOUNT est 0)
                           |--Clustered Index Scan(OBJECT:([T].[PK__T__15502E78]))

This is a typical narrow update plan.  The single clustered index update operator maintains both the clustered index and the non-clustered index TA.  The plan includes compute scalars that detect whether each row of the non-clustered index really needs to be updated.  I wrote about plans just like this one in this post.

Now suppose we run the same update statement but this time we modify column B:

UPDATE T SET B = 1 - B

Suddenly the plan is much more complex:

Rows   Executes
2      1        |--Index Update(OBJECT:([T].[TB]), SET:([PK1022] = [T].[PK],[B1023] = [T].[B]))
2      1             |--Collapse(GROUP BY:([T].[B]))
4      1                  |--Sort(ORDER BY:([T].[B] ASC, [Act1021] ASC))
4      1                       |--Filter(WHERE:(NOT [Expr1019]))
4      1                            |--Split
2      1                                 |--Clustered Index Update(OBJECT:([T].[PK__T__15502E78]), SET:([T].[B] = [Expr1003]))
2      1                                      |--Compute Scalar(DEFINE:([Expr1019]=[Expr1019]))
0      0                                           |--Compute Scalar(DEFINE:([Expr1019]=CASE WHEN [Expr1004] THEN (1) ELSE (0) END))
0      0                                                |--Compute Scalar(DEFINE:([Expr1003]=(1)-[T].[B], [Expr1004]=CASE WHEN [T].[B] = ((1)-[T].[B]) THEN (1) ELSE (0) END))
2      1                                                     |--Top(ROWCOUNT est 0)
2      1                                                          |--Clustered Index Scan(OBJECT:([T].[PK__T__15502E78]))

What's going on?  This time we updated a unique index.  The SQL Server storage engine enforces the uniqueness of the index at all times.  It does not permit the query processor to insert any duplicate rows into the index at any time.  The query processor, on the other hand, must ensure that an update statement succeeds so long as it does not leave behind any uniqueness violations upon its completion.  Based on this rule, the above update statement must succeed.

Let's consider what would happen if SQL Server simply tried to update the non-clustered index TB using the same simple plan that it used when we updated column A.   To execute this plan the server must scan the rows in the table and update them one at a time.  Suppose the server chose to update the row with PK=0 first.  In that case, it would attempt to change the value of B from 0 to 1.  Unfortunately, there is already a row in the index with B=1.  The storage engine would enforce the uniqueness of the index and the update would fail.  The update would similarly fail if the server chose to update the row with PK=1 first.  It would seem that there is no way to execute this update!

Fortunately, SQL Server has a solution.  The basic idea is simple.  Instead of updating the key columns of a unique index which can cause "false" uniqueness violations, the query processor reorganizes the update such that the non-key columns are updated instead.  This reorganization is implemented by the split, sort, and collapse operators.  Let's take a closer look at how this works.

In our example, we begin with two rows to update:

PK

B_old

B_new

0

0

1

1

1

0

The split operator transforms the updates into deletes followed by inserts:

Action

PK

B

Delete

0

0

Insert

0

1

Delete

1

1

Insert

1

0

Notice that as indicated by the above STATISTICS PROFILE output, we now have 4 rows.

The sort operator reorders the inserts and deletes by the key columns of the non-clustered index (in this case by column B).  If there is a delete and an insert that share the same key value, the delete sorts before the insert.  The results of the sort are:

Action

PK

B

Delete

0

0

Insert

1

0

Delete

1

1

Insert

0

1

The collapse operator combines adjacent delete and insert pairs that share the same key value into a single update:

Action

PK_old

PK_new

B

Update

0

1

0

Update

1

0

1

In this example, the 4 rows collapse back into 2 rows leaving only updates.  Notice that the updates no longer change column B (which could lead to a false uniqueness violation) but rather change column PK which is not a unique key of index TB and which cannot fail due to a uniqueness violation.  Note also that in general it is not necessarily the case that all deletes and inserts collapse into updates.  The results of the collapse operator may include any combination of inserts, updates, and deletes.

Finally, the index update operator executes the update operations output by the collapse operator.

Keep in mind that just because a query plan includes split, sort, and collapse operators does not mean that it may not result in an actual uniqueness violation.  It simply ensures that there will be no false uniqueness violations.  Moreover, SQL Server generates plans with split, sort, and collapse operators anywhere that there is a risk of a false uniqueness violation.  There may or may not be an actual uniqueness violation.  For example, the following update statement which (with the current data set) would not result in a false uniqueness violation generates a nearly identical plan:

UPDATE T SET B = B + 10

On the other hand, the following update statements can only update a single row.  SQL Server is clever enough to recognize that these statements cannot generate false uniqueness violations and generates simpler plans without the split, sort, and collapse operators:

UPDATE T SET B = B + 10 WHERE PK = 0
UPDATE TOP (1) T SET B = B + 10

Comments

  • Anonymous
    September 08, 2007
    Hi Craig,This is very helpful.I have a small question.You:The sort operator reorders the inserts and deletes by the key columns of the non-clustered index (in this case by column A).Q: Should it say: in this case by column B , instead?

  • Anonymous
    September 11, 2007
    Yes, it should have read have read "column B".  I've corrected the post.  Thanks for pointing out my error!

  • Anonymous
    January 30, 2008
    A few months ago, I wrote a post describing how SQL Server maintains unique indexes while avoiding false

  • Anonymous
    January 30, 2008
    A few months ago, I wrote a post describing how SQL Server maintains unique indexes while avoiding false

  • Anonymous
    February 27, 2008
    In a prior post , I introduced the notion that update plans consist of two parts: a read cursor that

  • Anonymous
    January 31, 2013
    You mentioned: "On the other hand, the following update statements can only update a single row.  SQL Server is clever enough to recognize that these statements cannot generate false uniqueness violations and generates simpler plans without the split, sort, and collapse operators" Can you please elaborate on whether there are ways to avoid the there amigos during SQL Inserts Merge Statements? It seems for updates, they can sometimes be avoided via singleton operations such as primary key or TOP 1s Thanks

  • Anonymous
    February 03, 2013
    Insert statements do not generate plans with the split, sort, and collapse operators regardless of how many rows are inserted.  Merge statements that update a unique index and/or both insert and delete from the target table will generate plans that include the split, sort, and collapse operators unless (as in the examples above) the optimizer can deduce that only a single row will be affected.  It's worth noting that merge statements due to their very nature generally produce more complex plans than standard insert, update, and delete statenents. HTH Craig