TSQL - How to Load Slowly Changing Dimension Type 2 ( SCD2) By Using TSQL Merge Statement


Scenario : 

Let's say we have dbo.Customer Table in OLTP Database or in Staging Database from which we have to load our Dim.Customer Slowly changing type 2 dimension by using TSQL Merge Statement.


Solution:


Step 1: 
Create Source Table dbo.Customer and insert couple of records.
USE TEST1
GO
CREATE TABLE dbo.Customer
  (
     CustomerID   INT IDENTITY(1, 1),
     CustomerCode UNIQUEIDENTIFIER,
     FirstName    VARCHAR(50),
     LastName     VARCHAR(50),
     [Address]    VARCHAR(200),
     Phone        VARCHAR(10)
  )

INSERT INTO dbo.Customer 
VALUES ( NEWID(),'Aamir','Shahzad',' 2119 XYZ NEW YORK USA','5054141969')
INSERT INTO dbo.Customer 
VALUES ( NEWID(),'Najaf1','Ali',' 32211 XYZ NEW Jersey USA','5054141969')


Step 2:
Create Dimension Table. We used CustomerCode as business key from source.

CREATE TABLE Dim.Customer
  (
     DimCustomerID INT IDENTITY(1, 1),
     CustomerCode  UNIQUEIDENTIFIER,
     FirstName     VARCHAR(50),
     LastName      VARCHAR(50),
     [Address]     VARCHAR(200),
     Phone         VARCHAR(10),
     StartDT       DATETIME,
     EndDT         DATETIME,
     IsLatest      BIT
  )



Step 3: 
Create Merge Statement, the statement can be used in SQL Server agent job or It can be used in SSIS Package ( Execute SQL Task). If you wish , you can create stored procedure for this statement and use in SQL Server Agent or SSIS Package to populate Dim.Customer Table.


-- Insert Records from Inner Merge as they they are update and 
-- need to insert as new record
INSERT INTO TestDB.Dim.Customer (CustomerCode, FirstName, LastName,
[Address], Phone, IsLatest,StartDT)
SELECT CustomerCode, FirstName, LastName,[Address], Phone,1,GETDATE() AS StartDT FROM(
-- Start of Merge Statement
MERGE INTO TestDB.Dim.Customer AS DST
USING Test1.dbo.Customer AS SRC
ON (SRC.CustomerCode = DST.CustomerCode
-- Business Key for Matching
-- WHEN Business Key does not match with Destination Records, Then insert
WHEN NOT MATCHED THEN
INSERT (CustomerCode, FirstName, LastName, [Address], Phone, IsLatest,StartDT)
VALUES (SRC.CustomerCode, SRC.FirstName, SRC.LastName,
        SRC.[Address],SRC.Phone, 1,GETDATE())
-- Business Key Matched but One or More than One filed has changed in Source)
WHEN MATCHED AND IsLatest = 1 AND ( 
-- Comparing if any of the field is different from 
--Source in Destination WHEN Business Key matches.
 ISNULL(DST.FirstName,'') != ISNULL(SRC.FirstName,'')
 OR ISNULL(DST.LastName,'') != ISNULL(SRC.LastName,'')
 OR ISNULL(DST.Address,'') !=ISNULL(SRC.Address,'')
 OR ISNULL(DST.Phone,'') != ISNULL(SRC.Phone,'')
 )
--UPDATE the record. Expire the record by setting IsLatest=0 and EndDate=getdate()
THEN UPDATE
SET DST.IsLatest = 0,
    DST.EndDT = GETDATE()
-- Use the Source records which needs to be inserted in Dim because of Update.
 OUTPUT SRC.CustomerCode, SRC.FirstName, SRC.LastName, SRC.[Address],SRC.phone,
        $Action AS MergeAction) AS MRG
         WHERE MRG.MergeAction = 'UPDATE';

Note: 
I have used StartDT and EndDT as datetime, most of the time the dimensions are loaded once a day and these columns can be changed to date datatype.



SSIS - [How To Use Precedence Constraint]

Scenario:

We are loading data from source file to our SQL Server table.  After the load we want to send email if number of records loaded are more than 5.

Solution:

We will be using Precedence Constraint with expressions to handle this scenario. Here is step by step approach.

Step 1:
Create a variable RowCnt with scope= Package level as shown below. We will be using this variable in expressions to make decision to send email or not.


Step 2:
Bring data flow task on Control Flow pane and then build as shown below. It should be extracting from Source File, Row Count Transformation will be using RowCnt variable to capture  rows moved between source and destination. Use OLE DB Destination to insert data into SQL Table.



 Step 3:
Choose Control Flow and connect Data flow task to Send Mail Task. After connecting them, double click on green connection ( Precedence constrain) and write expression as shown below.



 We are considering two things here, Once Success of Previous Task and Expression should be true. If both will be true then Send Mail Task will be execute else It will stop at Data Flow Task.

SSIS- Load Slowly changing dimension ( SCD) Type 1 [ Upsert ]

Scenario: 

Lets say we have to load a dimension table from text file. Our business Key is SSN. We need to insert new records depending upon values of SSN column, If any new then we need to insert this records. If SSN already existing in Table then we need to find out if any other column is changed from Source columns values. If that is true then we have to update those values.

Solution : 

Here is the source file we will be using. Create a text file on your desktop with below data

SSN,FirstName,LastName,Address
000-000-001,Aamir,Shahzad,NJ USA
000-000-002,John,River,NC USA

Create table in your database by using below script which we will be using as destination.

CREATE TABLE dbo.DimCustomer
  (
     CustomerID INT IDENTITY(1, 1),
     SSN        VARCHAR(11),
     FirstName  VARCHAR(50),
     LastName   VARCHAR(50),
     Address    VARCHAR(100),
     LoadDate   DATETIME,
     UpdateDate DATETIME,
     LoadedBy   VARCHAR(50),
     ModifiedBy VARCHAR(50)
  )


INSERT PART
--------------------------------------------------------------------------------------------------------------------------------- 

Step 1: 
Create new Package in BIDS. Bring Data Flow task in Control flow Pane. Rename your Data Flow task to "DFT_Load_Dim_Customer"


Step 2: 
Drag Flat File Source and make new connection to your Sourcefile that you have created on desktop by using above data.

Step 3:
Drag lookup Tranformation and connect to Flat File source and Configure Lookup as shown in figure. you have to make connect to your Database where your dbo.DimCustomer Table exists.

Step 4:
Go to Columns Tab and connect SSN from Available input columns to Available Lookup Columns

Step 5:
Choose the Cache Mode, In our case we have selected Full Cache( Store all records in memory before matching them). Also we want to redirect rows which do not match from Source to Look-up records.

Step 6:
Bring Derived Column Transformation and connect to Lookup No Match Output. These are the new records and we want to insert them into out Dim Table.

Step 7:
As our dimension table has two columns LoadTime and LoadedBy , we used Derived Column Transformation to generate those columns.

Step 8 :
Bring OLE DB Destination to Data Flow pane and connect with Derived Column transformation. Choose the Connection Manager and dbo.DimCustomer Table and then map as shown below


UPDATE PART
------------------------------------------------------------------------------------------------------------
Till here we are done with inserting new records depending upon the value of SSN. If a new record is coming in flat file, we will be able to insert that into destination (DimCustomer) table if that SSN does not exists in our dimension table.

Now there could be scenarios where SSN will be the same but last name of person has changed due to getting marry Or correction in First Name or Address has changed. In those cases we have to update records in our dbo.DimCustomer Table. Here are steps

Step 1:
Bring Lookup Transformation to Data Flow pane and connect with already existing lookup with " match output" arrow and configure as shown below.



We are comparing all the columns ( SSN, FirstName, LastName and Address). The goal is to find out any record which is updated.


Step 2:
Connect the "Lookup No Match Output" to Derived Column and Add two columns in Derived column UpdateDate and ModifiedBy.

Step 3:
Bring OLE  DB Command Transformation and connect with Derived column. Write your update statement to update records which are changed.

Update dbo.DimCustomer
SET FirstName=?,
LastName=?,
Address=?,
UpdateDate=?,
ModifiedBy=?
WHERE SSN=?

To see more details how to configure OLE DB Command transformation, visit
http://sqlage.blogspot.com/2013/08/ssis-how-to-use-ole-db-command.html


Map the columns

Complete Package:
Our complete package should look like this. OLE DB Command Transformation perform operation for each row that make it slow transformation. We you have a lot of updates. You might want to insert the data into staging table and use TSQL for insert and update.