SSIS- Load Slowly changing dimension ( SCD) Type 1 [ Upsert ]

Scenario: 

Lets say we have to load a dimension table from text file. Our business Key is SSN. We need to insert new records depending upon values of SSN column, If any new then we need to insert this records. If SSN already existing in Table then we need to find out if any other column is changed from Source columns values. If that is true then we have to update those values.

Solution : 

Here is the source file we will be using. Create a text file on your desktop with below data

SSN,FirstName,LastName,Address
000-000-001,Aamir,Shahzad,NJ USA
000-000-002,John,River,NC USA

Create table in your database by using below script which we will be using as destination.

CREATE TABLE dbo.DimCustomer
  (
     CustomerID INT IDENTITY(1, 1),
     SSN        VARCHAR(11),
     FirstName  VARCHAR(50),
     LastName   VARCHAR(50),
     Address    VARCHAR(100),
     LoadDate   DATETIME,
     UpdateDate DATETIME,
     LoadedBy   VARCHAR(50),
     ModifiedBy VARCHAR(50)
  )


INSERT PART
--------------------------------------------------------------------------------------------------------------------------------- 

Step 1: 
Create new Package in BIDS. Bring Data Flow task in Control flow Pane. Rename your Data Flow task to "DFT_Load_Dim_Customer"


Step 2: 
Drag Flat File Source and make new connection to your Sourcefile that you have created on desktop by using above data.

Step 3:
Drag lookup Tranformation and connect to Flat File source and Configure Lookup as shown in figure. you have to make connect to your Database where your dbo.DimCustomer Table exists.

Step 4:
Go to Columns Tab and connect SSN from Available input columns to Available Lookup Columns

Step 5:
Choose the Cache Mode, In our case we have selected Full Cache( Store all records in memory before matching them). Also we want to redirect rows which do not match from Source to Look-up records.

Step 6:
Bring Derived Column Transformation and connect to Lookup No Match Output. These are the new records and we want to insert them into out Dim Table.

Step 7:
As our dimension table has two columns LoadTime and LoadedBy , we used Derived Column Transformation to generate those columns.

Step 8 :
Bring OLE DB Destination to Data Flow pane and connect with Derived Column transformation. Choose the Connection Manager and dbo.DimCustomer Table and then map as shown below


UPDATE PART
------------------------------------------------------------------------------------------------------------
Till here we are done with inserting new records depending upon the value of SSN. If a new record is coming in flat file, we will be able to insert that into destination (DimCustomer) table if that SSN does not exists in our dimension table.

Now there could be scenarios where SSN will be the same but last name of person has changed due to getting marry Or correction in First Name or Address has changed. In those cases we have to update records in our dbo.DimCustomer Table. Here are steps

Step 1:
Bring Lookup Transformation to Data Flow pane and connect with already existing lookup with " match output" arrow and configure as shown below.



We are comparing all the columns ( SSN, FirstName, LastName and Address). The goal is to find out any record which is updated.


Step 2:
Connect the "Lookup No Match Output" to Derived Column and Add two columns in Derived column UpdateDate and ModifiedBy.

Step 3:
Bring OLE  DB Command Transformation and connect with Derived column. Write your update statement to update records which are changed.

Update dbo.DimCustomer
SET FirstName=?,
LastName=?,
Address=?,
UpdateDate=?,
ModifiedBy=?
WHERE SSN=?

To see more details how to configure OLE DB Command transformation, visit
http://sqlage.blogspot.com/2013/08/ssis-how-to-use-ole-db-command.html


Map the columns

Complete Package:
Our complete package should look like this. OLE DB Command Transformation perform operation for each row that make it slow transformation. We you have a lot of updates. You might want to insert the data into staging table and use TSQL for insert and update.

3 comments:

  1. Hi Tech Brothers, I have gone through all the tutorials and articles posted by you and they are very informative and based on real scenarios. Just awesome.
    I love the way they explain about any particular topic through a practical example irrespective of it being SSIS or SSRS etc.

    ReplyDelete
  2. Do you have any article based on incremental approach to load the data from source to destination without lookup transformation,SCD Transformation or Merge statement where datasets is about 30 millions rows ?

    ReplyDelete