How to Load Text /CSV files with same or less columns than destination table by using Script Task in SSIS Package - SSIS Tutorial

Scenario: Download Script

We get multiple text of csv files from different clients those we need to load to our dbo.Customer table. Not a problem, we can create an SSIS Package that can load multiple text files to SQL Server table and we have already did by using this post.But there is little problem. Each client will not send exact number of columns what we have for our table.

Create table dbo.Customer(
Id INT,
Name VARCHAR(100),
Dob Date)
 



Think about that client A sends always file with id, name and dob columns.
Client B can send file with Id and Name. Also Client B can send some files with three columns id,name and dob.
Client C can also send the file with only column Name and Dob or any other combination.

One thing to noticed here, the files can have same columns or less but will never have more columns than SQL Server table. We need to create an SSIS Package that should be able to load these type of files.

This package should be like template, that we can use for any type of related requirement and can change destination table name,Source folder, Archive Folder, File Extension and File Delimiter etc. by using SSIS Configuration.

Sample data files using for my demo. As you can see that I have three files each one has different combination of columns but all matches with columns  of SQL Server table.



Solution:

We will be using Script Task in SSIS Package to load all the text files to SQL Server table with same or less columns than table definition.

Step 1: Create New SSIS Package and Variables 
Open SSDT ( SQL Server Data Tools) and create new SSIS Package. After that go ahead and create variables so we can use them in configuration to pass values anytime our requirement change.

ArchiveFolder: Provide the folder path where you would like to move files after loading. Datetime part will be added to file name.
DestinationTable : Provide the Destination Table name with Schema as shown where you would like to load flat files( text files or csv files)
FileDelimiter : Provide the delimiter which is used in your txt or csv files.
FileExtension : Provide the Extension of files you would like to load from folder.
LogFolder : Provide the folder path where you would like to create log file in case of error in script task
SourceFolder: Provide the source folder path where text files or csv files are places for import process.

Create Variable in SSIS Package to Load and Archive flat files by using Script Task -SSIS tutorial



Step 2:

Click in Connection Manager Pane and then Create ADO.NET Connection by providing Server Name and database Name. After creating I have renamed it to DBConn.

Create ADO.NET Connection so we can use in Script Task to load the data to Destination Table 


Step 3: Add Variables to Script Task to use from SSIS Package
Bring the Script Task on Control Flow Pane in SSIS Package and open by double clicking Check-box in front of variable to add to Script Task.
Add variables to Script Task so we can load multiple Text files or csv files to SQL Server Table in SSIS Package


Step 4: Add Script to Script task Editor in SSIS Package to import multiple CSV files or Text Files to SQL Server Table
Click Edit Button and it will open Script Task Editor.
Under #region Namespaces, I have added below code

using System.IO;
using System.Data.SqlClient;


Under public void Main() { 
I have added below code.


            string datetime = DateTime.Now.ToString("yyyyMMddHHmmss");
            try
            {

                //Declare Variables
            string SourceFolderPath = Dts.Variables["User::SourceFolder"].Value.ToString();
            string FileExtension = Dts.Variables["User::FileExtension"].Value.ToString();
            string FileDelimiter = Dts.Variables["User::FileDelimiter"].Value.ToString();
            string TableName = Dts.Variables["User::DestinationTable"].Value.ToString();
            string ArchiveFolder = Dts.Variables["User::ArchiveFolder"].Value.ToString();
                //string ColumnList = "";

                //Reading file names one by one
           string SourceDirectory = SourceFolderPath;
           string[] fileEntries = Directory.GetFiles(SourceDirectory, "*" + FileExtension);
                foreach (string fileName in fileEntries)
                {

           SqlConnection myADONETConnection = new SqlConnection();
           myADONETConnection = (SqlConnection)
           (Dts.Connections["DBConn"].AcquireConnection(Dts.Transaction) as SqlConnection);

                    //Writing Data of File Into Table
                    int counter = 0;
                    string line;
                    string ColumnList = "";
                    //MessageBox.Show(fileName);

                    System.IO.StreamReader SourceFile =
                    new System.IO.StreamReader(fileName);
                    while ((line = SourceFile.ReadLine()) != null)
                    {
                        if (counter == 0)
                        {
                            ColumnList = "[" + line.Replace(FileDelimiter, "],[") + "]";

                        }
                        else
                        {
                      string query = "Insert into " + TableName + " (" + ColumnList + ") ";
                      query += "VALUES('" + line.Replace(FileDelimiter, "','") + "')";

                            // MessageBox.Show(query.ToString());
                        SqlCommand myCommand1 = new SqlCommand(query, myADONETConnection);
                            myCommand1.ExecuteNonQuery();
                        }

                        counter++;
                    }

                    SourceFile.Close();
                    //move the file to archive folder after adding datetime to it
                    File.Move(fileName, ArchiveFolder + "\\" + 
                        (fileName.Replace(SourceFolderPath,"")).Replace(FileExtension,"") 
                        + "_" + datetime+FileExtension);
                    Dts.TaskResult = (int)ScriptResults.Success;
                }
            }
            catch (Exception exception)
            {
                // Create Log File for Errors
                using (StreamWriter sw = File.CreateText(Dts.Variables["User::LogFolder"].Value.ToString()
                    + "\\" + "ErrorLog_" + datetime + ".log"))
                {
                    sw.WriteLine(exception.ToString());
                    Dts.TaskResult = (int)ScriptResults.Failure;
                }

            }



Step 5: Save Script and Run SSIS Package
Save the script in Script Editor and execute your SSIS Package, It should load the files from Source folder to SQL Server table and move the file to archive folder after loading it.

Here are records loaded by package in dbo.Customer table from my sample files.
Data load to Table from Multiple Text files with less or same Columns as Table definition in SSIS Package by using Script Task


 Files moved to Archive Folder after datetime added to them as shown below.
How to load CSV files with same or less columns to SQL server Table in SSIS Package




 Related Posts / Videos on Dynamic Text / CSV files by Script Task 


10 comments:

  1. What if we have comma inside the data columns - it considers as a column which is not correct, can you help me?

    ReplyDelete
    Replies
    1. I'm going to have to solve this same issue as I implement this solution.
      General CSV format would probably have quotes around any column "value" that includes a comma.
      Knowing that, I am going to focus on the following line of code for a way to find all such instances of data enclosed in quotes with a comma that follows.

      string query = "Insert into " + TableName + " (" + ColumnList + ") ";
      query += "VALUES('" + line.Replace(FileDelimiter, "','") + "')";

      Definitely activating that messagebox will help see what's being written.
      In the end,, something from the CSV like : "Bob, and Sons"
      will need to be converted in the query to: 'Bob, and Sons'
      Before trying to write to the DB.
      I'd be glad to collaborate on the solution and have checked the Notify Me box

      Delete
  2. Excellent article - this will assist me with several issues of "Random Data Formats" that my partners are sending me.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. how would you handle text qualifiers if exists for some records?

    ReplyDelete
  5. I have a CSV file which contains 10 records out of which 9 records are in correct format and the 10th record is having a invalid date so that for me 9 records should get inserted into the SQL table for the 10th record in the log file i should get the error. Can you please help me out

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Hi,

    So I am getting this error and I am not sure how to fix it:

    System.Data.SqlClient.SqlException (0x80131904): There are more columns in the INSERT statement than values specified in the VALUES clause. The number of values in the VALUES clause must match the number of columns specified in the INSERT statement.
    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
    at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
    at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
    at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
    at System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean async, Int32 timeout, Boolean asyncWrite)
    at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
    at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
    at ST_2301a917d093437cb5cbcc3dc745b478.ScriptMain.Main()
    ClientConnectionId:06147bcb-a833-425b-9f04-e12f59ee2098
    Error Number:109,State:1,Class:15

    What is basically happening is that I am loading a csv file with about 11K records, whenever it finish loading record number 9291 it throws that error, meaning record 9292 probably has an issue. The file I am loading has fewer records than the destination table in SQL, but that ain't an issue coz this Task Script should be able to load the file.

    Any suggestion on what the issue could be, thanks in advaance.

    ReplyDelete
  8. can you also insert the filename, fileloaddate using the same code

    ReplyDelete
  9. How to Load CSV files with same or more columns in SSIS.

    we have two file - csv1, csv2

    csv1 has 9 columns and csv2 has 10 columns.

    ReplyDelete