Friday, June 19, 2009

Merge Statement - SQL Server 2008

Another enhancement in the Business Intellegence part is the introduction of Merge statement in SQL Server. Many a times we have faced challenges in keeping the Transactional database and the datawarehouse(Historical data) objects in sync. The update on the datawarhouse (DW) has to happen with the changes that occur on the transactional database on day to day bases. In practise, the DW database is not updated very frequently. Before a DW is updated it has to compared with the source (transactional db) and all the changes in the source will have to reflect in the DW database. The question is, Why to write separate (insert, update, delete) DML statement for keeping tables in sync?



Now we introduce the concept of merging the 3 DML statements into a single T-SQL. That's the Merge Statement.



Following is the syntax for the Merge Statement.




MERGE
USING
ON
[ WHEN MATCHED [ AND ]
THEN ]
[ WHEN [TARGET] NOT MATCHED [ AND ]
THEN ]
[ WHEN SOURCE NOT MATCHED [ AND ]
THEN ]




So, let us see the various senarios using an example.




IF OBJECT_ID('tblCustomer','U') IS NOT NULL
DROP TABLE dbo.tblCustomer
GO
CREATE TABLE dbo.tblCustomer
(CustomerID int NOT NULL, CustomerName varchar(20) NOT NULL,City varchar(20) NOT NULL)
GO


IF OBJECT_ID('tmpCustomer','U') IS NOT NULL
DROP TABLE dbo.tmpCustomer
GO
CREATE TABLE dbo.tmpCustomer
(CustomerID int NOT NULL, CustomerName varchar(20) NOT NULL,City varchar(20) NOT NULL)


GO
INSERT INTO tblCustomer(CustomerID,CustomerName,City)
VALUES (1,'James','Chennai'),
(2,'John','Chennai'),
(3,'Debbie','Bangalore')


GO
INSERT INTO tmpCustomer(CustomerID,CustomerName,City)
VALUES (1,'James','Chennai'),
(3,'Debbie','Delhi'),
(4,'Nancy','Bangalore'),
(5,'Mala','Chennai')


GO
SELECT * FROM tblCustomer
SELECT * FROM tmpCustomer




In this example tblCustomer is the target table which will updated based source tmpCustomer and the update on the target table will happen based on the CustomerID column.



Scenario 1:


If the CustomerID matches with the target table then the city has to updated on the target table

Scenario 2:

If the CustomerID is in the source but not in the target table, then the record has to be inserted into target table. This happens when a new prouct gets introduced into the market and the target table is unware of the change.

Scenario 3:

If the CustomerID in the target table does not match records in the source table, then the record in the target table can be deleted. Most of times the data in DW does not get deleted for such reasons.

MERGE tblCustomer as Main
USING tmpCustomer tmp
ON Main.CustomerID = tmp.CustomerID
WHEN MATCHED THEN -- Scenario 1
UPDATE SET City = tmp.City
WHEN NOT MATCHED BY SOURCE THEN -- Scenario 3
DELETE
WHEN NOT MATCHED BY TARGET THEN -- Scenario 2
INSERT (CustomerID,CustomerName,City)
VALUES (tmp.CustomerID,tmp.CustomerName,tmp.City);


Thats it!! We are done. On top it you also have an option to investigate your target table to check for the changes. You need to use an action keyword with your Merge statement. Delete the values of both source and target table and then insert the values again. Then run the following query,


MERGE tblCustomer as Main
USING tmpCustomer tmp
ON Main.CustomerID = tmp.CustomerID
WHEN MATCHED THEN
UPDATE SET City = tmp.City
WHEN NOT MATCHED BY SOURCE THEN
DELETE
WHEN NOT MATCHED BY TARGET THEN
INSERT (CustomerID,CustomerName,City)
VALUES (tmp.CustomerID,tmp.CustomerName,tmp.City)
OUTPUT $action,tmp.CustomerID;

Do write in your queries and comments. Will catch up with some other features soon.

Change Data Capture - SQL Server 2008

One of the new feature of SQL Server 2008 is the ability to capture DML operations that occurs on an exsisting table. Change Data Capture (CDC) is considered to be a duplicate log information which is stored in a table format. The best part is now you query your CDC table to find out the various changes that your table has undergone. Even you can now map your query related to time.



The following are the steps to be enjoy the feature of CDC.



Step 1: Enable CDC on Database



First CDC has to enabled on the DB level.



USE [master]
GO
CREATE DATABASE SQL2008CDC
GO
USE SQL2008CDC
GO
EXEC sys.sp_cdc_enable_db




To enable CDC on the DB the stored proedure sys.sp_cdc_enable_db has to be executed. To check the DB has CDC enabled run the following T-SQL statement. If you see the is_cdc_enabled column for SQL2008CDC database, the value will be 1.



SELECT [name] AS DBName, is_cdc_enabled from sys.databases



Once the CDC is enabled you will find some tables created inside the System Tables. They will help to be keep track of the changes that happens inside the tables. Will look into these tables later. You will also find the cdc user that has been created inside your DB.



Step 2: Ok! Now, we need enable CDC on the tables we would like to capture the DML operations. To enable CDC on the table make sure the SQL Server Agent is running.

CREATE TABLE dbo.Employee
(
EmpID int NOT NULL PRIMARY KEY,
EmpName nvarchar(100) NOT NULL,
EmpEmail nvarchar(100) NOT NULL
)
GO

EXEC sys.sp_cdc_enable_table
@source_schema=N'dbo',
@Source_name=N'Employee',
@role_name=N'cdc_admin',
@capture_instance=N'dbo_Employee',
@supports_net_changes=1;



Once you have enabled CDC on the table you will find two jobs created inside Jobs folder in SQL Server Agent, capture job and clean job.

If you look at the CDC table it has the same schema of that of the Employee table. You can also restrict the columns in CDC table by specifying the columns inside the stored procedure sp_cdc_enable_enable. Run the following query to check if the table has CDC enabled.

SELECT name, is_tracked_by_cdc from sys.tables

Step 3:

I will be now inserting 4 records into the table Employee

INSERT INTO dbo.Employee VALUES (1,'George Clooney','georger.clonney@oceans13.com')
INSERT INTO dbo.Employee VALUES (2,'Ben Stiller','ben.stiller@oceans13.com')
INSERT INTO dbo.Employee VALUES (3,'Tom Cruise','tom.cruise@oceans13.com')
INSERT INTO dbo.Employee VALUES (4,'Juila Roberts','juila.roberts@oceans13.com')

Look into the Employee and CDC table.

SELECT * FROM dbo.Employee
SELECT * FROM cdc.dbo_Employee_CT

You will see the CDC has a column _$start_lsn (log sequence number) which is the unique across transactions that occur on the Employee table. The _$operation column value 2 indicates it is an insert to the Employee table.

Now I will be deleting a value a record from the table Employee.

DELETE FROM dbo.Employee WHERE EmpID = 4

Now you will find _$operation value as 1 which indicates it is a delete statement.


UPDATE dbo.Employee SET EmpName = 'Brad Pitt' WHERE EmpID = 2
UPDATE dbo.Employee SET EmpName = 'Matt Damon' WHERE EmpID = 3

You will see for each and every update 2 rows gets inserted in the CDC table. Now the _$operation value as 3 (before image)and 4 (after image). Both rows will have the same lsn.

Now can I have mentioned earlier now you can query your CDC table.

DECLARE @from_lsn binary(10), @to_lsn binary(10);
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Employee');
SET @to_lsn = sys.fn_cdc_get_max_lsn();
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Employee(@from_lsn, @to_lsn, 'all')
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Employee(@from_lsn,@to_lsn,'all')

Get all change function will list out all the DML operation that has been done with the Employee table. For example if a record in the Employee table has been updated thrice, all the three updates will be reflected in the CDC table.

Get net change will sum up changes into a single record. So record will appear as it is in the Employee table.

One can also map the lsn to time and query it based on a specific date.


DECLARE @begin_time DATETIME, @end_time DATETIME,
@from_lsn binary(10), @to_lsn binary(10)
SET @begin_time = GETDATE()
SET @end_time = GETDATE()-1
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal',@begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal to',@end_time);
SELECT @begin_time, @end_time, @from_lsn, @to_lsn
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Employee(@from_lsn, @to_lsn, 'all')
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Employee(@from_lsn,@to_lsn,'all')

Hope you have got an idea on how CDC works.

FAQ:

Will it affect the performance?

Absolutely Not!! CDC will have to read the information from the log file and so while updating the original table it need not wait or care about CDC.

Well, that is the end of it. You can now post in your query and soon will come up with some detailed information on CDC.

Thursday, June 18, 2009

Hi All.. My first post!!

Hi All... This is my first post. This very blog has been created for write ups about my experience with Microsoft exsisting and upcoming versions SQL Server and .NET framework. I will also do some wrting on exciting day to day experience that makes me to think over and over again. Hope you would key in your valuable suggestions to make this blog better.

Thanks,
Shankar