Friday, June 19, 2009

Change Data Capture - SQL Server 2008

One of the new feature of SQL Server 2008 is the ability to capture DML operations that occurs on an exsisting table. Change Data Capture (CDC) is considered to be a duplicate log information which is stored in a table format. The best part is now you query your CDC table to find out the various changes that your table has undergone. Even you can now map your query related to time.



The following are the steps to be enjoy the feature of CDC.



Step 1: Enable CDC on Database



First CDC has to enabled on the DB level.



USE [master]
GO
CREATE DATABASE SQL2008CDC
GO
USE SQL2008CDC
GO
EXEC sys.sp_cdc_enable_db




To enable CDC on the DB the stored proedure sys.sp_cdc_enable_db has to be executed. To check the DB has CDC enabled run the following T-SQL statement. If you see the is_cdc_enabled column for SQL2008CDC database, the value will be 1.



SELECT [name] AS DBName, is_cdc_enabled from sys.databases



Once the CDC is enabled you will find some tables created inside the System Tables. They will help to be keep track of the changes that happens inside the tables. Will look into these tables later. You will also find the cdc user that has been created inside your DB.



Step 2: Ok! Now, we need enable CDC on the tables we would like to capture the DML operations. To enable CDC on the table make sure the SQL Server Agent is running.

CREATE TABLE dbo.Employee
(
EmpID int NOT NULL PRIMARY KEY,
EmpName nvarchar(100) NOT NULL,
EmpEmail nvarchar(100) NOT NULL
)
GO

EXEC sys.sp_cdc_enable_table
@source_schema=N'dbo',
@Source_name=N'Employee',
@role_name=N'cdc_admin',
@capture_instance=N'dbo_Employee',
@supports_net_changes=1;



Once you have enabled CDC on the table you will find two jobs created inside Jobs folder in SQL Server Agent, capture job and clean job.

If you look at the CDC table it has the same schema of that of the Employee table. You can also restrict the columns in CDC table by specifying the columns inside the stored procedure sp_cdc_enable_enable. Run the following query to check if the table has CDC enabled.

SELECT name, is_tracked_by_cdc from sys.tables

Step 3:

I will be now inserting 4 records into the table Employee

INSERT INTO dbo.Employee VALUES (1,'George Clooney','georger.clonney@oceans13.com')
INSERT INTO dbo.Employee VALUES (2,'Ben Stiller','ben.stiller@oceans13.com')
INSERT INTO dbo.Employee VALUES (3,'Tom Cruise','tom.cruise@oceans13.com')
INSERT INTO dbo.Employee VALUES (4,'Juila Roberts','juila.roberts@oceans13.com')

Look into the Employee and CDC table.

SELECT * FROM dbo.Employee
SELECT * FROM cdc.dbo_Employee_CT

You will see the CDC has a column _$start_lsn (log sequence number) which is the unique across transactions that occur on the Employee table. The _$operation column value 2 indicates it is an insert to the Employee table.

Now I will be deleting a value a record from the table Employee.

DELETE FROM dbo.Employee WHERE EmpID = 4

Now you will find _$operation value as 1 which indicates it is a delete statement.


UPDATE dbo.Employee SET EmpName = 'Brad Pitt' WHERE EmpID = 2
UPDATE dbo.Employee SET EmpName = 'Matt Damon' WHERE EmpID = 3

You will see for each and every update 2 rows gets inserted in the CDC table. Now the _$operation value as 3 (before image)and 4 (after image). Both rows will have the same lsn.

Now can I have mentioned earlier now you can query your CDC table.

DECLARE @from_lsn binary(10), @to_lsn binary(10);
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Employee');
SET @to_lsn = sys.fn_cdc_get_max_lsn();
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Employee(@from_lsn, @to_lsn, 'all')
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Employee(@from_lsn,@to_lsn,'all')

Get all change function will list out all the DML operation that has been done with the Employee table. For example if a record in the Employee table has been updated thrice, all the three updates will be reflected in the CDC table.

Get net change will sum up changes into a single record. So record will appear as it is in the Employee table.

One can also map the lsn to time and query it based on a specific date.


DECLARE @begin_time DATETIME, @end_time DATETIME,
@from_lsn binary(10), @to_lsn binary(10)
SET @begin_time = GETDATE()
SET @end_time = GETDATE()-1
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal',@begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal to',@end_time);
SELECT @begin_time, @end_time, @from_lsn, @to_lsn
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Employee(@from_lsn, @to_lsn, 'all')
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Employee(@from_lsn,@to_lsn,'all')

Hope you have got an idea on how CDC works.

FAQ:

Will it affect the performance?

Absolutely Not!! CDC will have to read the information from the log file and so while updating the original table it need not wait or care about CDC.

Well, that is the end of it. You can now post in your query and soon will come up with some detailed information on CDC.

No comments:

Post a Comment