Adventures in TSQL: Using CTEs when performing DELETE/UPDATE operations
It seems in quite a few applications of late there has been the requirement to batch process, in a distinct order, a series of entries from a table; namely a processing queue. Without delving in broker services this is easily managed using standard TSQL. The requirement is to delete a predefined number of entries from a processing queue, and selecting the deleted entries so they can be used to manage downstream processing.
Table Definition
Before getting into the guts of TSQL here is the definition for the TABLE that will be used during the discussion, and a script to populate the table:
CREATE TABLE [dbo].[ActivityQueue]
(
[ActivityId] int IDENTITY(1,1) PRIMARY KEY NOT NULL,
[ActivityType] int NOT NULL,
[ActivityMessage] xml NOT NULL
);
GO
DECLARE @idx int = 0;
WHILE (@idx < 100000)
BEGIN
INSERT INTO [dbo].[ActivityQueue] ([ActivityType], [ActivityMessage])
VALUES (@idx, '<process>Just a test message</process>');
SET @idx = @idx + 1;
END
GO
Deletion without Ordering
If one was not concerned with the order by which data is pulled from a queue the OUPUT clause of a DELETE operation easily solves the requirement; as show here:
DECLARE @processSize int = 1000;
DELETE TOP(@processSize) [dbo].[ActivityQueue]
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage];
If one looks at the query plan for this operation it is a simple Clustered Index scan and delete operation:
This leads to the simple statistics:
- Table 'ActivityQueue'. Scan count 1, logical reads 18
However if one needs to delete from the activity table in a specified order things get a little messier.
Deletion with Ordering
The normal process when one needs to delete entries from a table in a defined order is to first select the required entities, placing them into a temporary table. The temporary table is then used to perform the DELETE operation by performing an INNER JOIN:
DECLARE @processSize int = 1000;
INSERT INTO @queue ([ActivityId], [ActivityType], [ActivityMessage])
SELECT TOP(@processSize) [ActivityId], [ActivityType], [ActivityMessage]
FROM [dbo].[ActivityQueue] WITH (UPDLOCK)
ORDER BY [ActivityId] ASC;
DELETE [dbo].[ActivityQueue]
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage]
FROM [dbo].[ActivityQueue] AS AQ
INNER JOIN @queue AS QU ON QU.[ActivityId] = AQ.ActivityId;
The INSERT operation first gives us the following query plan:
Followed by the DELETE and SELECT operation:
Looking at the statistics for both operations we get:
- Table '#17F790F9'. Scan count 0, logical reads 2055
- Table 'ActivityQueue'. Scan count 1, logical reads 18
- Table 'ActivityQueue'. Scan count 0, logical reads 3000
- Table '#17F790F9'. Scan count 1, logical reads 14
As you can see this is a massive jump in read operations. If we also have to select the processed records in the correct order we will end up scanning the temporary table once more:
SELECT [ActivityId], [ActivityType], [ActivityMessage]
FROM @queue
ORDER BY [ActivityId] ASC;
So how can CTEs help in this process?
Deletion using CTEs
With the addition of a CTE we have the option of combining the use of the OUPTPUT statement with the definition of which entities to delete:
DECLARE @processSize int = 1000;
WITH [QueueData] ([ActivityId], [ActivityType], [ActivityMessage])
AS (
SELECT TOP(@processSize) [ActivityId], [ActivityType], [ActivityMessage]
FROM [dbo].[ActivityQueue]
ORDER BY [ActivityId] ASC
)
DELETE [QueueData]
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage];
The CTE allows us to define which entities we are deleting along with the necessary ORDER BY clause. If one looks at the resulting query plan you will see we are back to the simple initial case:
As you may have already determined this also means our IO operations are also back down:
- Table 'ActivityQueue'. Scan count 1, logical reads 18
CTEs are also useful for ordered UPDATE operations. One of the limitations of UPDATE TOP() statement is that an ORDER BY cannot be specified. This means statements such as the following are not permitted:
UPDATE TOP (@processSize) [dbo].[ActivityQueue]
SET [ActivityType] = 2
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage]
WHERE [ActivityType] = 0
ORDER BY [ActivityId] ASC
However, once again, CTEs offer an easy solution.
Table Definition
Before getting into the guts of TSQL here is the definition for the TABLE that will be used during the discussion, and a script to populate the table.
USE [Development]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ActivityQueue]') AND type in (N'U'))
DROP TABLE [dbo].[ActivityQueue];
GO
CREATE TABLE [dbo].[ActivityQueue]
(
[ActivityId] int IDENTITY(1,1) PRIMARY KEY NOT NULL,
[ActivityType] int NOT NULL,
[ActivityMessage] xml NOT NULL
);
GO
CREATE INDEX IDX_ActivityQueue_ActivityType
ON [dbo].[ActivityQueue]
(
[ActivityType] ASC
);
DECLARE @idx int = 0;
WHILE (@idx < 100000)
BEGIN
INSERT INTO [dbo].[ActivityQueue] ([ActivityType], [ActivityMessage])
VALUES (@idx % 10, '<process>Just a test message</process>');
SET @idx = @idx + 1;
END
GO
Updates with Ordering
The normal process when one needs to updates entries from a table in a defined order, such that they are output for processing, is to first select the required entities, placing them into a temporary table, and locking those entries. The temporary table is then used to perform the UPDATE operation by performing an INNER JOIN:
DECLARE @queue TABLE ([ActivityId] int PRIMARY KEY);
DECLARE @processSize int = 100;
INSERT INTO @queue ([ActivityId])
SELECT TOP(@processSize)
[ActivityId]
FROM [dbo].[ActivityQueue] WITH (UPDLOCK)
WHERE [ActivityType] = 0
ORDER BY [ActivityId] ASC;
UPDATE [dbo].[ActivityQueue]
SET [ActivityType] = 2
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage]
FROM [dbo].[ActivityQueue] AS AQ
INNER JOIN @queue AS QU ON QU.[ActivityId] = AQ.ActivityId;
The INSERT operation first gives us the following query plan:
Followed by the UPDATE and SELECT operation:
Once again using CTEs allows us to greatly simplify this process.
Update using CTEs
CTEs once again allow us, in a single statement, to define an ordered set and update this set with an OUTPUT specification.
WITH [QueueData] ([ActivityId], [ActivityType], [ActivityMessage])
AS (
SELECT TOP(@processSize) [ActivityId], [ActivityType], [ActivityMessage]
FROM [dbo].[ActivityQueue]
WHERE [ActivityType] = 0
ORDER BY [ActivityId] ASC
)
UPDATE [QueueData]
SET [ActivityType] = 2
OUTPUT deleted.[ActivityId], deleted.[ActivityType], deleted.[ActivityMessage];
If one looks at the resulting query plan you will see that once again it is greatly simplified:
Although CTEs are commonly demonstrated to solve the expanding hierarchy problem they can also be used to solve other TSQL challenges.