How to Load Data from Excel Files when Number of Columns can decrease or order is changed in Excel Sheet - SSIS tutorial

Scenario:Link to Script

Let's say you are working for Auto Insurance Company as ETL developer, you get different excel files from different regional office those you need to load to Dbo.Customer Table in TechBrothersIT database.
Here is the definition of SQL server table.

CREATE TABLE dbo.Customer (
    id INT
    ,name VARCHAR(50)
    ,dob DATE
    )
GO

But you often face problem as you don't always get the exact columns.Sometime you get a nice excel file with Id,Name and Dob and often you get the file with only Name column and sometime you get the file with Id and name. 

You notice from above, you will be always getting the file with correct column names but sometime less or more or sometime with exact same number. 

Consider, I got three below files as shown below and need to load to dbo.Customer Table.

You notices that I did not mention the name of Sheet, It can be anything as long as it has only one sheet. As for this scenario, I am considering there will be always single sheet.


How to load Excel files dynamically to SQL Server Table when Column are not exact Same in SSIS

Let's check the data in the Sheet in each file.
Sheet Data from Customer_TechBrothersIT_NorthAmerica.xlsx - Load Excel Dynamically in SSIS

Sheet data from Customer_TechBrothersIT_Asia.xlsx - How to Load Excel Sheet Data Dynamically in SSIS


Sheet Data from Customer_TechBrothersIT_03032016.xlsx- Load Excel Dynamically in SSIS

If you notices that, in some Excel files, the sheet has columns in order as we have in SQL Server Table. In some of them the order of columns is messed up and even in some of them the columns are missing.

Now the questions is , How to Load these different sheets to single Table in SQL Server Table, What solution you will propose in this situation as ETL Developer?

Solution:

We can't handle above situation with builtin Excel Source as number of columns are not constant in our sheets. Also the Sheet name is not confirmed and we have to work hard to make that dynamic by using expressions. I will suggest to handle this situation in Script Task.

Step 1:
Let's create two variables. 
FolderPath: that will contain the folder path where our files will be dropped or put.
TableName: Provide the table name.

By using above variable, we are making our SSIS Package dynamic. You can simply change values for two variables and start loading the data to new table from your excel files. Also this is really helpful when you are deploying your SSIS Package to QA,UAT and Prod as Folder Path is often different in each environment.

Create Variables in SSIS Package to Load Excel Sheet Dynamically in SSIS Package by Script Task


Step 2:
Create ADO.NET Connection Manager so we can use in Script Task to Load data from Excel Sheets to SQLServer Table.

Create ADO.NET Connection in SSIS Package to use in Script Task for Dynamic Excel Loading

Once you will click on New ADO.NET Connection, you will be choose your SQL Server and Database. I have created DBConn as you can see in above snapshot.

Step3: Map the Variables in Script Task ( FolderPath and TableName)
Bring the Script Task on Control Flow Pane in SSIS Package and open by double clicking. Map the variables as shown below.

Map variables in Script Task to use for Dynamic Excel Loading- SSIS Tutorial 

You have noticed that I selected Microsoft Visual C# 2012 for script in Script Task. You can use VB, if you like but below code is provided in C#.


Step 4:
Click Edit Button and it will open Script Task Editor.
Under #region Namespaces, I have added below code
using System.IO;
using System.Data.OleDb;
using System.Data.SqlClient;


Under public void Main() { 
I have added below code.


String FolderPath=Dts.Variables["User::FolderPath"].Value.ToString();
            String TableName = Dts.Variables["User::TableName"].Value.ToString();
            var directory = new DirectoryInfo(FolderPath);
            FileInfo[] files = directory.GetFiles();
            
            //Declare and initilize variables
            string fileFullPath = "";
            
            //Get one Book(Excel file at a time)
            foreach (FileInfo file in files)
            {
              
                    fileFullPath = FolderPath+"\\"+file.Name;
                                   
                //Create Excel Connection
                string ConStr;
                string HDR;
                HDR="YES";
                ConStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileFullPath + ";Extended Properties=\"Excel 12.0;HDR=" + HDR + ";IMEX=0\"";
                    OleDbConnection cnn = new OleDbConnection(ConStr);

                
                //Get Sheet Name
                   cnn.Open();
                DataTable dtSheet = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
                string sheetname;
                sheetname="";
           foreach (DataRow drSheet in dtSheet.Rows)
            {
                if (drSheet["TABLE_NAME"].ToString().Contains("$"))
                {
                     sheetname=drSheet["TABLE_NAME"].ToString();
                } 
            }

         //Load the DataTable with Sheet Data
         OleDbCommand oconn = new OleDbCommand("select * from ["+sheetname+"]", cnn);
                    OleDbDataAdapter adp = new OleDbDataAdapter(oconn);
                    DataTable dt = new DataTable();
                    adp.Fill(dt);
                    cnn.Close();
             
                  
//Use the ADO.NET connection and Load the data from DataTable to SQL Table
  SqlConnection myADONETConnection = new SqlConnection();
  myADONETConnection = (SqlConnection)(Dts.Connections["DBConn"].AcquireConnection(Dts.Transaction) as SqlConnection);

  using (SqlBulkCopy BC = new SqlBulkCopy(myADONETConnection))
                    {
                        BC.DestinationTableName = TableName;
                        foreach (var column in dt.Columns)
                            BC.ColumnMappings.Add(column.ToString(), column.ToString());
                        BC.WriteToServer(dt);
                    }
                
            }



Step 5:
Save the code and exit Script task. Run the SSIS Package and go to SSMS and check the data in the table. 
How to load data from Excel file dynamically in SQL Server Table by Script Task in SSIS Package


As you can see that the data from three files with single sheet is loaded to our table. As you don't want to load the files again, it is good idea to move the files to some other folder after loading. you can check this link to see how the files can be archived. You can use Script Task itself or File System task to archive the files after loading.


Video Demo How to Import Excel Files to SQL Server Table with Exact Number or columns or less Columns in Excel Sheets



Check out our other posts/videos for Dynamic Excel Source and Destination
  1. How to Load Data from Excel Files when Number of Columns can decrease or order is changed in Excel Sheet
  2. How to Load Only Matching Column Data to SQL Server Table from Multiple Excel Files (Single Sheet per file) Dynamically in SSIS Package
  3. How to Load Excel File Names with Sheet Names ,Row Count,Last Modified Date, File Size in SQL Server Table
  4. How to Load Multiple Excel Files with Multiple Sheets to Single SQL Server Table by using SSIS Package
  5. How to Load Matching Sheets from Excel to Table and Log Not Matching Sheets Information in SQL Server Table
  6. How to create Table for each sheet in Excel Files and load data to it dynamically in SSIS Package
  7. How to Create Table per Excel File and Load all Sheets Data Dynamically in SSIS Package by using Script Task 
  8. How to create CSV file per Excel File and Load All Sheets from Excel File to it in SSIS Package
  9. How to Create CSV File for Each Excel Sheet from Excel Files in SSIS Package
  10. How to Load Excel File Name and Sheet Name with Data to SQL Server in SSIS Package
  11. How to Import data from Multiple Excel Sheets with a pattern of sheet names from Multiple Excel File in SSIS Package
  12. How to import Data from Excel Files for specific Sheet Name to SQL Server Table in SSIS Package
  13. Load Data To Tables according to Excel Sheet Names from Excel Files dynamically in SSIS Package
  14. How to Load Excel Files with Single/ Multiple Sheets to SQL Server Tables according to Excel File Name Dynamically
  15. How to Read Excel Sheet Data after Skipping Rows in SSIS Package by using Script Task 
  16. How to read data from Excel Sheet and Load to Multiple Tables by using Script Task in SSIS Package
  17. How to create Excel File Dynamically from SQL server Table/View by using Script Task in SSIS Package
  18. How to create Excel File Dynamically for Stored Procedure Results in SSIS Package by using Script Task
  19. How to Export SQL Server Tables from Database to Excel File Dynamically in SSIS Package by using Script Task
  20. How to Convert CSV/Text Files to Excel Files in SSIS Package by using Script Task
  21. How to Load All CSV Files to Excel Sheets ( Sheet Per CSV) in single Excel File in SSIS Package
  22. How to Load All CSV Files to Single Excel Sheet with File Names in an Excel File Dynamically in SSIS Package
  23. How to Create Sample Excel file with Sheet from each table with Top 1000 Rows per sheet in SSIS Package
  24. How to Export Data to Multiple Excel Sheets from Single SQL Server Table in SSIS Package


5 comments:

  1. It's really helpful. I have one doubt over here. Here ,table is created already and script task dumps the data dynamically .How to create tables at runtime ?

    ReplyDelete
    Replies
    1. Hi ashwini, you can extract the column name from the excel sheet and send it as comma separated variable through a SQL statement to the SQL server. There run an SP which will create a table based on the column name.

      OleDbCommand oconn = new OleDbCommand("select top 1 * from [" + NewSheetname + "]", cnn);
      OleDbDataAdapter adp = new OleDbDataAdapter(oconn);
      oconn.CommandTimeout = 600;
      DataTable dt = new DataTable();
      adp.Fill(dt);
      cnn.Close();
      //Prepare Header columns list so we can run against Database to get matching columns for a table.
      //string ExcelHeaderColumn = "";
      string SQLQueryToGetMatchingColumn = "";
      string ExcelHeaderColumn = "";
      for (int i = 0; i < dt.Columns.Count; i++)
      {
      if (i != dt.Columns.Count - 1)
      {
      //ExcelHeaderColumn += "'" + dt.Columns[i].ColumnName + "'" + ",";
      ExcelHeaderColumn += '[' + dt.Columns[i].ColumnName + ']' + ",";
      }
      else
      {
      //ExcelHeaderColumn += "'" + dt.Columns[i].ColumnName + "'";
      ExcelHeaderColumn += '[' + dt.Columns[i].ColumnName + ']';
      }
      }
      //ExcelHeaderColumn will be as '[Column1],[Column2],[Column3],[Column4]'

      DBConnect("Give SP name here", NewSheetname, ExcelHeaderColumn);//this is to connect to DB.
      //DBConnect is a function which connect to database

      public void DBConnect(string proc, string sheetName, string ExcelHeaderColumn)
      {
      SqlCommand cmd = new SqlCommand(proc, db_connect);
      cmd.CommandTimeout = 600;


      string NewSheetName = sheetName.Remove(sheetName.Length - 1);

      cmd.Parameters.Add("@columnList", SqlDbType.NVarChar).Value = ExcelHeaderColumn;
      cmd.Parameters.Add("@SheetName", SqlDbType.NVarChar).Value = NewSheetName;

      cmd.CommandType = CommandType.StoredProcedure;
      //SQLColumnList = (string)cmd.ExecuteScalar();
      //result = cmd.ExecuteReader().ToString();

      DataSet dsData = new DataSet();

      using (SqlDataAdapter adpSPResult = new SqlDataAdapter(cmd))
      {
      //try { adp.Fill(dsData); }
      adpSPResult.Fill(dsData);
      //catch (Exception ex) { }
      }
      DataTable dtResult = dsData.Tables[0];
      if (dtResult.Rows[0]["ColumnList"].ToString().Length > 1)
      {
      tableName = Dts.Variables["User::TableName"].Value.ToString();
      SQLColumnList = dtResult.Rows[0]["ColumnList"].ToString();
      }
      else
      {
      string justtoverify = "Error";
      }
      //else
      //{
      // tableName = Dts.Variables["User::TableName"].Value.ToString();
      // SQLColumnList = dtResult.Rows[0]["ColumnList"].ToString();
      //}

      }


      for writing the SP to make comma separated variable to table use the following link.

      https://stackoverflow.com/questions/16023245/how-to-create-a-table-structure-from-column-values-of-another-table-in-mssql


      Delete
  2. thank you very much ! very Helpful

    ReplyDelete
  3. But the day before returning to the country, the purchase of time, I ghost and walked to the Liverpool city center of the Rau store, went in to turn around and found nothing, just ready to go when the previous sales * * * sister called me,replica rolex watches said there is a yacht Mr. want to see, I was the whole person is not good, because I spent part of my money, want to mention this piece of words simply not enough, from the domestic turn of words time and too late. So try to wear a bit bear to give up.A few months ago I had the idea of trying out some high end or precious metal models.fake LV handbags Of course the first thing I wanted was cheap louboutin red bottoms an Audemars Piguet Royal Oak, but the case was too big for my wrist after I'd had a go at the 15500, and now that Audemars Piguet is trying to tighten up their authorised shops, getting a popular steel model at AP Boutique is beyond most people's reach, and my favourite 15202 Jumbo is still too far away. The 15202 Jumbo is still too far away.

    ReplyDelete