SSIS - How To Remove Duplicate Record/s and Keep Single Record in Data Flow Task [ Sort Transformation]

Scenario:

We are loading data from Flat File that can has duplicate records in it. We want to remove the duplicate records and keep only single record for those duplicate records.


Solution :

In SSIS, Sort Transformation can help us to perform this requirement. Sort Transformation will remove the duplicate records and let only single record pass through. If you have small file, Sort Transformation will work just fine but if you have millions of records and your computer (Server) has small RAM capacity (Random Access Memory) then you might want to load these records into some staging table and then write TSQL to extract unique records. 

Step 1 : 
Bring Data Flow task to Control Flow Pane and inside Data Flow Task Drag Flat File Source and create connection to your source file.

Click on Columns to see if all columns are correctly mapped


Step 2: 
Drag Sort Transformation and Connect Flat File Source to it. Double Click on Sort Transformation and Choose the columns to Sort. Also Check the Check Box : Remove rows with duplicate sort values and then click OK.


Step 3:
Drag Multicast Transformation and connect Sort Transformation to it. Put Data Viewer between Multicast Transformation and Sort Transformation to view the output.

The duplicate copies of records are removed and only single record is passed through Sort Transformation as we can see in Data Viewer display.

3 comments: