SQL Server Change Data Capture (CDC)

Introduction

"Change data capture is designed to capture insert, update, and delete activity applied to SQL Server tables, and to make the details of the changes available in an easily consumed relational format. The change tables used by change data capture contain columns that mirror the column structure of a tracked source table, along with the metadata needed to understand the changes that have occurred." -MSDN Article

Real World Scenario

There are two scenarios where Change Data Capture can play an important part:
      1.  For data auditing
      2.  For Use with Data warehousing
 
In the first scenario, CDC can be very useful for tracking data changes for auditing purposes.
A concrete example for this is a bank application where a depositor would query
what has happened to his account  and how his account has changed over time
or in a much simpler term a historical view of his count.

Another scenario where CDC can play an important part is an accounting application
where a dispute may usually happen.

CDC, however, is designed for a short time audit because it has a retention period. However, you
can disable retention and cleansing of CDC data. The default retention for CDC is three days.
Any application that would require an immediate access to the tail-end part of the transaction
would be a great candidate for immediate audit requirements.

The second scenario where CDC can play an important part is in data warehousing. CDC table can be
used as data source for the ETL rather than the actual OLTP this is due to the fact that  CDC can contain
a more historical data as compared to the regular OLTP tables. Another good reason is that you can do the
ETL without disturbing the base tables.

Performance

The strongest selling point of CDC is performance. As you can see from the diagram below.
CDC tables are taken directly from the database log without disturbing your OLTP tables.
The downside, of course, is that the logs cannot be trimmed unless CDC has completed its task.
Although CDC is making use of transaction logs, the database is not required to be in full recovery mode.

 
Figure 1. CDC Architecture. Diagram courtesy of *MSDN
*

System Requirements

CDC works on the following version of SQL Server:
   1. Enterprise edition
   2. Developer edition

Change Data Capture Step-by-Step

 

Step #1.  Create a database to be enabled for CDC

                  *  Create database cdctestdb*

Step #2. Enable CDC on the database

* use cdctestdb
 exec sys.sp_cdc_enable_db*

Step #3. Verify that CDC is enabled on the database

                            select name, is_cdc_enabled from sys.databases

                         CDC will also create system table in the database

            The System table which have been created under CDC schema are listed below:

cdc.captured_columns – This table returns result for list of captured column.
cdc.change_tables – This table returns list of all the tables which are enabled for capture.
cdc.ddl_history – This table contains history of all the DDL changes since capture data enabled.
**cdc.index_columns **– This table contains indexes associated with change table.
cdc.lsn_time_mapping – This table maps LSN number (for which we will learn later) and time.

Step #4.  Create a Table

CREATE TABLE [dbo].[students](
 [student_id] [int] IDENTITY(1,1) NOT NULL,
 [lastname] [varchar](50) NULL,
 [firstname] [nchar](50) NULL,
 [middle_name] [nchar](50) NULL,
 CONSTRAINT [PK_students] PRIMARY KEY CLUSTERED
(
 [student_id] ASC
))

What's new

Step #4. Enable CDC on the Table

*exec sys.sp_cdc_enable_table
@source_schema=N'dbo',
@source_name=N'students',
@role_name=NULL
*

Step #5. Verify that CDC is enabled in the table

select name,is_tracked_by_cdc  from sys.tables
where is_tracked_by_cdc=1

results

name         is_tracked_by_cdc
------        -------------------
students    1

Step #6.   Insert records

             insert into students (lastname,firstname,middle_name)
              values
            ('student1ln','student1fn','student1mn'),
            ('student2ln','student2fn','student2mn'),
            ('student3ln','student3fn','student3mn')

Step #7.  Check if the inserted record are being tracked

Notice that SQL Server will create a system table to track changes.

Run a select query on the cdc table to see audit entries

Step #8.  Disable CDC

USE  cdctestdb
GO 
EXEC sys.sp_cdc_disable_db 
GO

What's new in Change Data Capture in SQL Server 2012?

A new feature of  Change Data Capture in SQL Server 2012 is the addition of  Change Data Capture
in SQL server Data Capture for Oracle database. For more information please see the link below.

http://blogs.msdn.com/b/mattm/archive/2012/03/26/cdc-for-oracle-in-sql-server-2012.aspx