TSQL - How to Load Slowly Changing Dimension Type 2 ( SCD2) By Using TSQL Merge Statement


Scenario : 

Let's say we have dbo.Customer Table in OLTP Database or in Staging Database from which we have to load our Dim.Customer Slowly changing type 2 dimension by using TSQL Merge Statement.


Solution:


Step 1: 
Create Source Table dbo.Customer and insert couple of records.
USE TEST1
GO
CREATE TABLE dbo.Customer
  (
     CustomerID   INT IDENTITY(1, 1),
     CustomerCode UNIQUEIDENTIFIER,
     FirstName    VARCHAR(50),
     LastName     VARCHAR(50),
     [Address]    VARCHAR(200),
     Phone        VARCHAR(10)
  )

INSERT INTO dbo.Customer 
VALUES ( NEWID(),'Aamir','Shahzad',' 2119 XYZ NEW YORK USA','5054141969')
INSERT INTO dbo.Customer 
VALUES ( NEWID(),'Najaf1','Ali',' 32211 XYZ NEW Jersey USA','5054141969')


Step 2:
Create Dimension Table. We used CustomerCode as business key from source.

CREATE TABLE Dim.Customer
  (
     DimCustomerID INT IDENTITY(1, 1),
     CustomerCode  UNIQUEIDENTIFIER,
     FirstName     VARCHAR(50),
     LastName      VARCHAR(50),
     [Address]     VARCHAR(200),
     Phone         VARCHAR(10),
     StartDT       DATETIME,
     EndDT         DATETIME,
     IsLatest      BIT
  )



Step 3: 
Create Merge Statement, the statement can be used in SQL Server agent job or It can be used in SSIS Package ( Execute SQL Task). If you wish , you can create stored procedure for this statement and use in SQL Server Agent or SSIS Package to populate Dim.Customer Table.


-- Insert Records from Inner Merge as they they are update and 
-- need to insert as new record
INSERT INTO TestDB.Dim.Customer (CustomerCode, FirstName, LastName,
[Address], Phone, IsLatest,StartDT)
SELECT CustomerCode, FirstName, LastName,[Address], Phone,1,GETDATE() AS StartDT FROM(
-- Start of Merge Statement
MERGE INTO TestDB.Dim.Customer AS DST
USING Test1.dbo.Customer AS SRC
ON (SRC.CustomerCode = DST.CustomerCode
-- Business Key for Matching
-- WHEN Business Key does not match with Destination Records, Then insert
WHEN NOT MATCHED THEN
INSERT (CustomerCode, FirstName, LastName, [Address], Phone, IsLatest,StartDT)
VALUES (SRC.CustomerCode, SRC.FirstName, SRC.LastName,
        SRC.[Address],SRC.Phone, 1,GETDATE())
-- Business Key Matched but One or More than One filed has changed in Source)
WHEN MATCHED AND IsLatest = 1 AND ( 
-- Comparing if any of the field is different from 
--Source in Destination WHEN Business Key matches.
 ISNULL(DST.FirstName,'') != ISNULL(SRC.FirstName,'')
 OR ISNULL(DST.LastName,'') != ISNULL(SRC.LastName,'')
 OR ISNULL(DST.Address,'') !=ISNULL(SRC.Address,'')
 OR ISNULL(DST.Phone,'') != ISNULL(SRC.Phone,'')
 )
--UPDATE the record. Expire the record by setting IsLatest=0 and EndDate=getdate()
THEN UPDATE
SET DST.IsLatest = 0,
    DST.EndDT = GETDATE()
-- Use the Source records which needs to be inserted in Dim because of Update.
 OUTPUT SRC.CustomerCode, SRC.FirstName, SRC.LastName, SRC.[Address],SRC.phone,
        $Action AS MergeAction) AS MRG
         WHERE MRG.MergeAction = 'UPDATE';

Note: 
I have used StartDT and EndDT as datetime, most of the time the dimensions are loaded once a day and these columns can be changed to date datatype.



2 comments: