Scenario:
We have create a SSIS
Package for Upsert(Insert/Update). We get csv file with millions of records
with (Id,Name,Address columns). If the record come with new Id , we need to
insert that record in dbo.Customer table(id, name, address) and for existing IDs
we need to update those records.
After doing some analysis,
we got to know that the number of records those need to be updated on daily
basis are minimum 100,000 per day. To perform above task we can use Lookup
Transformation and find out existing and non existing records. Any non-existing
IDs can be directly inserted into dbo.Customer table but for update we have to
use OLE DB Command transformation. OLE DB Command transformation is slow, it
will update one row at a time and for 100,000 records it will take long
time.
How about inserting the
records into some staging table and write TSQL Statement to Insert/update
records? Good idea! It will be fast and easy to do. But my Architect do not
want to create a new table :(
Solution:
Ok, How about we create Temp
table and then use it in our package to perform the above task and once done,
the Temp table will be gone!
Let's start with step by
step approach
Step 1:
Prepare Source.csv file on
desktop by using below data
Id,Name,Address
1,Aamir,ABC ADDRESS
2,Raza,Test Address
3,July, 123 River Side CA
4,Robert,540 Rio Rancho NM
Step 2:
Create dbo.Customer Table by
using below script
USE TestDB GOCREATE TABLE dbo.Customer ( ID INT, Name VARCHAR(100), Address VARCHAR(100) )
Step 3:
Create SSIS Package to load csv file into dbo.Customer Table.( Insert new records and update existing)
Create OLE DB Connection to the database where your dbo.Customer table exists. Right Click on Connection and then click properties or Click on Connection and press F4 to go to properties.
Set RetainSameConnection=True.
Fig 1: Set RetainSameConnection to True for OLE DB Connection
Step 4:
Create ##Temp table by using Execute SQL Task as shown below by using
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))
Fig 2: Create ##Temp table by using Execute SQL Task
Step 5:
Bring Data Flow Task to Control Flow Surface and then connect Execute SQL task to it. Inside Data Flow task bring Flat File Source and make connection to Source.csv file that you have created in Step 1.
Drag Lookup Transformation and configure as shown below. Our goal is to Insert any record which Id does not exist in dbo.Customer table and if ID exists we want to update that records. Instead of using OLE DB Command Transformation, we will insert records which needs to be update in ##Temp table inside Data Flow Task.
Fig 3: Configure Lookup Transformation ( Redirect rows to no match output)
Fig 4: Choose Id from dbo.Customer for lookup
Fig 5: Map the Source Id to dbo.Customer.ID for lookup
Step 6:
Bring OLE DB Destination Transformation from Data Flow Items as shown. Join No Match Output ( new records) of Lookup to OLE DB Destination and choose destination Table (dbo.Customer).
Fig 6: Insert new records by using No Match Output of Lookup Transformation
As we do not want to use OLE DB Command transformation for update inside Data Flow Task. Let's write all records those need to be update into ##Temp table by using OLE DB Destination. We will not be able to see ##Temp table in drop down in OLE DB Destination. Here are two steps we need to take
i) Create a variable with name ##Temp as shown below
Fig 7: TableName variable holding Temp Table Name
ii) Go to SSMS and create ##Temp table ( if you would not create this table, you will not be able to map the columns in OLE DB Destination)
Create Table ##Temp(ID INT, Name VARCHAR(100),ADDRESS VARCHAR(100))
Bring the OLE DB Destination and map to TableName Variable as shown below.
Fig 8: Configure OLE DB Destination to use TableName variable for Destination Table Name.
Fig 9: Map the Source Columns to ##Temp Table Columns
After all the configuration our Data Flow will look like below figure. I renames the transformation to provide better picture about what we are doing in this Data Flow Task.
Fig 10: Data Flow Task with ##Temp Table Destination.
Step 7:
Go to Control Flow Surface and Drag Execute SQL Task to write update statement.
UPDATE DST
SET DST.Name=SRC.Name
,DST.ADDRESS=SRC.ADDRESS
FROM dbo.Customer DST
INNER JOIN ##Temp SRC
ON DST.ID=SRC.ID
Fig 11: Execute SQL Task to Update Dbo.Customer from ##Temp
Our final SSIS Package will look like below
Fig 12: Insert/Update Package by using Temp Table for Updates
If we try to run the SSIS Package, It might complain that ##Temp does not exists. Go to package properties by right clicking in Control Flow Pane and Set DelayValidation=True. By setting DelayValidation we are asking the package not to validate any objects as ##Temp table does not exist at this point and it will be created later in Package.
Fig 13: Set Delay Validation=True
Run the Package couple of times and check the data in dbo.Customer table. Data should be loaded. Now let's go to Source.csv file and change some values for Name and Address columns and run the package one more time to make sure, Update logic is working fine.
Here is the data after update.
Id,Name,Address
1,Aamir1,Test ADDRESS
2,Raza1,Test Address
3,July, 123 River Side CA USA
4,Robert,540 Rio Rancho NM
Fig 14: Package Execution After Updating Records in Source.csv file
As we can see that the records are updated, where ever we made changes in Name and Address values.
Fig 16: dbo.Customer data after Upsert
This comment has been removed by the author.
ReplyDeleteI think this post about SSIS operations can actually help you a lot in knowing every detail and aspect of it.Thanks.
ReplyDeleteThanks for sharing this information. This is really useful. Keep doing more.
ReplyDeleteSpoken English Classes in Chennai
Spoken English Class in Chennai
Spoken English in Chennai
IELTS Training in Chennai
IELTS Chennai
Best English Speaking Classes in Mumbai
Spoken English Classes in Mumbai
IELTS Mumbai
IELTS Center in Mumbai
IELTS Coaching in T Nagar
If we already created ##temp in SSMS then how its going to create thru package showing error temp table already exits
ReplyDeleteWhen we move the package to different environment, this will work fine.
DeleteGetting invalid object name ##TEMP in OLEDB destination
ReplyDeleteYou need to SET DelayValidation to TRUE on package properties
DeleteHi Sir,
ReplyDeleteI am learning SSIS and SSrs from you techbrotherIT.
Really compare to other tutor, Your one is very excellent and easy to learn.
This try this example.
But as other said, \
Its is not working and through error only.
Thanks
Gangatharan
al_ganga@yahoo.com