SSIS - Replace Multiple Characters/Words From Column [How to Use Script Component]


Let’s say we need to load data from flat file source. The Name column in flat file has some non ascii characters as well some other words those we do not want to load and want to replace with blank space. To replace these characters and words we used derived column but after few days we got a new list of characters or garbage words those we also have to replace. We do not want to open our package again and write expressions in Derived column to replace. Our goal is to build a SSIS Package where we can create a list of garbage words and then use that to clean data in our Name column.


As our garbage words list can change anytime so we will be creating a variable in our SSIS Package to hold list of garbage characters or words. Each character or word will be separated by using Pipe (|). We can use SSIS Configuration to provide values to this variable or we provide the values in SQL Server agent job.
Here are steps how we are going to build our SSIS Package.

Step 1:
Create a variable in SSIS Package and provide the characters/words those you want to replace in your input column with blank space.
I have created “BadWordsList” variable with some characters and words “Shahzad|Test|Œ|Õ|Û|ÞÞ” those are garbage for me and want to replace them in my input column with blank space.

Step 2:
Create connection to your text file by using flat file connection manager. I have used below data for this example in text file.
Name, Age
AamirΠShahzad Test,33
RazaÛ Ali Shahzad,32
Najaf Ali Shahzad,11

Step 3:
Bring Script component in Data Flow Pane and then connect Flat File Source to it and configure as shown in snapshots.

Code that I included in Script Component  can be seen in snapshot.

using System.Text.RegularExpressions;
 string garbagelist = Variables.BadWordsList;
 Row.SCName = Regex.Replace(Row.Name, garbagelist, "", RegexOptions.IgnoreCase);

Final Output : 

Add Multicast Transformation in Data Flow task and Connect Script Task to it( We are using Multicast here just for testing). Add Data Viewer between Script Component and Multicast to see the results.

As we can see in the snapshot that all the garbage characters/words are removed with blank space by using script component and SC_Name column does not have any of those characters/words.