How to Load Only Matching Column Data to SQL Server Table from Excel Dynamically in SSIS Package - SSIS Tutorial

Scenario: Link to Script

You are working as ETL Developer/SSIS developer for Financial firm. You get Excel files in source folder with single sheet that you need to load into SQL Server Table called dbo.Customer in TechbrothersIT Database.
Here is the definition of dbo.Customer Table.

CREATE TABLE dbo.Customer (
    id INT
    ,NAME VARCHAR(50)
    ,dob DATE
    )

There are some problems with Excel Sheet data
1) Sometime you get exact number of columns in Sheet
2) Sometime you get extra columns in Excel file Sheets
3) Sometime you get less columns than your table definition.
4) Excel File can be provided with any name in given folder
5) Sheet Name can change as well in each Excel file

You want to developer an SSIS Package, that should be able to handle above situation and load all data for columns which match with our SQL Server Table Definition( dbo.Customer). 

It should ignore extra columns in Excel sheets and load the matching columns data without error. If we receive less columns in Excel sheet, It should go ahead and load that into table. What would be your approach to developer this type of SSIS Package?

This is how our excel files and sheets looks like
How to load data from Excel to SQL server Table for Matching Columns only - Script Task SSIS

Sheet Data for Customer_TechBrothersIT1.xlsx

Load Data from Excel Sheet Dynamically in SQL Server Table by using Script Task by using C#


Sheet Data for Customer_TechBrothersIT2.xlsx
Load multiple Excel Files with Single Sheet to SQL Server Table Script Task in SSIS


Sheet Data for Customer_TechBrothersIT3.xlsx
Import Data to SQL Server Table from Multiple Excel File with Single Sheets for Matching Column


Notice that we have multiple files with single sheet with more, less and equal number of columns to dbo.Customer Table.

Solution:

As Number of Columns are not fixed, there is plenty of things we need to do. With Excel Source available in SSIS, it is not going to be possible to verify the columns and load data only for matching columns. The best way could be we can use Script task in SSIS Package and use scripting language such as C# or VB. In our example, we will be using C#.

Step 1: 
Create your SSIS Package and then create below variables
FolderPath:  Provide the folder path where Excel files will be dropped.
TableName: Provide the table name in which you would like to load sheet data from each File.
SchemaName: Provide the Schema of Table.

Create Variables in SSIS Package for Dynamic Excel Loading to SQL Server Table- Script Task 


Step2: 
Create ADO.NET Connection Manager as shown below in your SSIS Package.You have to provide SQL Server Instance Name and Database. I have renamed connection to DBConn as can be seen below.
Create ADO.NET Connection to use in Script task in SSIS Package for Excel Dynamic Loading



Step 3: Map variables in Script Task
Bring Script task on Control Flow Task in SSIS Package, Open Script Task and Map variables as shown below.
Map variables in Script Task for Load SQL Server Table from Excel Files dynamically-SSIS Tutorial



Step 4: 
Click on Edit Script Button and then copy below code to your Script Task Editor.

Under #Region NameSpaces, paste below namespaces.

using System.IO;
using System.Data.OleDb;
using System.Data.SqlClient;


Got to public void Main()
{
and paste below code.

String FolderPath=Dts.Variables["User::FolderPath"].Value.ToString();
            String TableName = Dts.Variables["User::TableName"].Value.ToString();
            String SchemaName = Dts.Variables["User::SchemaName"].Value.ToString();
            var directory = new DirectoryInfo(FolderPath);
            FileInfo[] files = directory.GetFiles();
            
            //Declare and initilize variables
            string fileFullPath = "";
            
            //Get one Book(Excel file at a time)
            foreach (FileInfo file in files)
            {
                   fileFullPath = FolderPath+"\\"+file.Name;
                                   
                //Create Excel Connection
                string ConStr;
                string HDR;
                HDR="YES";
                ConStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileFullPath + ";Extended Properties=\"Excel 12.0;HDR=" + HDR + ";IMEX=0\"";
                    OleDbConnection cnn = new OleDbConnection(ConStr);
   
                //Get Sheet Name
                   cnn.Open();
                DataTable dtSheet = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
                string sheetname;
                sheetname="";
           foreach (DataRow drSheet in dtSheet.Rows)
            {
                if (drSheet["TABLE_NAME"].ToString().Contains("$"))
                {
                     sheetname=drSheet["TABLE_NAME"].ToString();
                } 
            }

                //Load the DataTable with Sheet Data so we can get the column header
 OleDbCommand oconn = new OleDbCommand("select top 1 * from [" + sheetname + "]", cnn);
           OleDbDataAdapter adp = new OleDbDataAdapter(oconn);
           DataTable dt = new DataTable();
           adp.Fill(dt);
           cnn.Close();

   //Prepare Header columns list so we can run against Database to get matching columns for a table.
                    string ExcelHeaderColumn = "";
                    string SQLQueryToGetMatchingColumn = "";
                    for (int i = 0; i < dt.Columns.Count; i++)
                    {
                        if (i != dt.Columns.Count - 1)
                          ExcelHeaderColumn += "'" + dt.Columns[i].ColumnName + "'" + ",";
                        else
                            ExcelHeaderColumn += "'" + dt.Columns[i].ColumnName + "'";
                    }
                   
              SQLQueryToGetMatchingColumn="select STUFF((Select  ',['+Column_Name+']' from Information_schema.Columns where Table_Name='"+
      TableName+"' and Table_SChema='"+SchemaName+"'"+
     "and Column_Name in ("+@ExcelHeaderColumn+") for xml path('')),1,1,'') AS ColumnList";

               // MessageBox.Show(SQLQueryToGetMatchingColumn);
                //MessageBox.Show(ExcelHeaderColumn);
                
                //USE ADO.NET Connection
                    SqlConnection myADONETConnection = new SqlConnection();
myADONETConnection = (SqlConnection)(Dts.Connections["DBConn"].AcquireConnection(Dts.Transaction) as SqlConnection);

                //Get Matching Column List from SQL Server
                    string SQLColumnList = "";
                    SqlCommand cmd = myADONETConnection.CreateCommand();
                    cmd.CommandText = SQLQueryToGetMatchingColumn;
                    SQLColumnList = (string)cmd.ExecuteScalar();

                    //MessageBox.Show(" Matching Columns: " + SQLColumnList);


                    //Use Actual Matching Columns to get data from Excel Sheet
                    OleDbConnection cnn1 = new OleDbConnection(ConStr);
                    cnn1.Open();
OleDbCommand oconn1 = new OleDbCommand("select "+SQLColumnList+" from [" + sheetname + "]", cnn1);
                    OleDbDataAdapter adp1 = new OleDbDataAdapter(oconn1);
                    DataTable dt1 = new DataTable();
                    adp1.Fill(dt1);
                    cnn1.Close();


                //Load Data from DataTable to SQL Server Table.
                    using (SqlBulkCopy BC = new SqlBulkCopy(myADONETConnection))
                    {
                        BC.DestinationTableName = SchemaName+"."+TableName;
                        foreach (var column in dt1.Columns)
                            BC.ColumnMappings.Add(column.ToString(), column.ToString());
                        BC.WriteToServer(dt1);
                    }
                
            }

Save the script and Exit Script Task. 


Step 5:
Go ahead and run the package, it should load the data from multiple files to your Table. Remember each File has single sheet in this scenario. Also I noticed that BulkCopy is case sensitive when it comes to match the columns. Make sure you always have the Column Name in same case as you have in SQL Server Table.

Excel Data with Matching Column to SQL Server Table- SSIS Script Task C#






Sub Items Covered in above article:

How to load Data from Excel Files to SQL Server Table by using C#
How to Read Sheet Name from Excel File in SSIS Package
How to read Excel File Name in SSIS Package
How to Check matching Header Column Names to SQL Server Table by using C sharp
How to save scalar string value from SQL Server Query to Variable in C Sharp
How to use BulkCopy to load data from DataTable to SQL Server Table
How to Map Columns Dynamically in BulkCopy C Sharp




Check out our other posts/videos for Dynamic Excel Source and Destination
  1. How to Load Data from Excel Files when Number of Columns can decrease or order is changed in Excel Sheet
  2. How to Load Only Matching Column Data to SQL Server Table from Multiple Excel Files (Single Sheet per file) Dynamically in SSIS Package
  3. How to Load Excel File Names with Sheet Names ,Row Count,Last Modified Date, File Size in SQL Server Table
  4. How to Load Multiple Excel Files with Multiple Sheets to Single SQL Server Table by using SSIS Package
  5. How to Load Matching Sheets from Excel to Table and Log Not Matching Sheets Information in SQL Server Table
  6. How to create Table for each sheet in Excel Files and load data to it dynamically in SSIS Package
  7. How to Create Table per Excel File and Load all Sheets Data Dynamically in SSIS Package by using Script Task 
  8. How to create CSV file per Excel File and Load All Sheets from Excel File to it in SSIS Package
  9. How to Create CSV File for Each Excel Sheet from Excel Files in SSIS Package
  10. How to Load Excel File Name and Sheet Name with Data to SQL Server in SSIS Package
  11. How to Import data from Multiple Excel Sheets with a pattern of sheet names from Multiple Excel File in SSIS Package
  12. How to import Data from Excel Files for specific Sheet Name to SQL Server Table in SSIS Package
  13. Load Data To Tables according to Excel Sheet Names from Excel Files dynamically in SSIS Package
  14. How to Load Excel Files with Single/ Multiple Sheets to SQL Server Tables according to Excel File Name Dynamically
  15. How to Read Excel Sheet Data after Skipping Rows in SSIS Package by using Script Task 
  16. How to read data from Excel Sheet and Load to Multiple Tables by using Script Task in SSIS Package
  17. How to create Excel File Dynamically from SQL server Table/View by using Script Task in SSIS Package
  18. How to create Excel File Dynamically for Stored Procedure Results in SSIS Package by using Script Task
  19. How to Export SQL Server Tables from Database to Excel File Dynamically in SSIS Package by using Script Task
  20. How to Convert CSV/Text Files to Excel Files in SSIS Package by using Script Task
  21. How to Load All CSV Files to Excel Sheets ( Sheet Per CSV) in single Excel File in SSIS Package
  22. How to Load All CSV Files to Single Excel Sheet with File Names in an Excel File Dynamically in SSIS Package
  23. How to Create Sample Excel file with Sheet from each table with Top 1000 Rows per sheet in SSIS Package
  24. How to Export Data to Multiple Excel Sheets from Single SQL Server Table in SSIS Package

5 comments:

  1. When I try to connect I receive the following error:

    ystem.Data.OleDb.OleDbException (0x80004005): External table is not in the expected format.
    at System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection)
    at System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)
    at System.Data.ProviderBase.DbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
    at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)
    at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
    at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
    at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
    at System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
    at System.Data.OleDb.OleDbConnection.Open()
    at ST_4cbc72eca9334175970bb0becb97fb76.ScriptMain.Main()

    ReplyDelete
  2. error: ---------------------------

    ---------------------------
    System.Data.OleDb.OleDbException (0x80004005): External table is not in the expected format.

    em System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection)

    em System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)

    em System.Data.ProviderBase.DbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)

    em System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)

    em System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)

    em System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)

    em System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)

    em System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)

    em System.Data.OleDb.OleDbConnection.Open()

    em ST_773529b6404a42fb9adfa9b82999f6b3.ScriptMain.Main()
    ---------------------------
    OK
    ---------------------------

    ReplyDelete