SSIS - Replace Multiple Characters/Words From Column [How to Use Script Component]

Scenario:

Let’s say we need to load data from flat file source. The Name column in flat file has some non ascii characters as well some other words those we do not want to load and want to replace with blank space. To replace these characters and words we used derived column but after few days we got a new list of characters or garbage words those we also have to replace. We do not want to open our package again and write expressions in Derived column to replace. Our goal is to build a SSIS Package where we can create a list of garbage words and then use that to clean data in our Name column.

Solution:

As our garbage words list can change anytime so we will be creating a variable in our SSIS Package to hold list of garbage characters or words. Each character or word will be separated by using Pipe (|). We can use SSIS Configuration to provide values to this variable or we provide the values in SQL Server agent job.
Here are steps how we are going to build our SSIS Package.

Step 1:
Create a variable in SSIS Package and provide the characters/words those you want to replace in your input column with blank space.
I have created “BadWordsList” variable with some characters and words “Shahzad|Test|Œ|Õ|Û|ÞÞ” those are garbage for me and want to replace them in my input column with blank space.




Step 2:
Create connection to your text file by using flat file connection manager. I have used below data for this example in text file.
Name, Age
AamirŒ Shahzad Test,33
RazaÛ Ali Shahzad,32
Najaf Ali Shahzad,11
ÕJuily,23
ÞÞRobert,54

Step 3:
Bring Script component in Data Flow Pane and then connect Flat File Source to it and configure as shown in snapshots.





Code that I included in Script Component  can be seen in snapshot.

using System.Text.RegularExpressions;
 string garbagelist = Variables.BadWordsList;
 Row.SCName = Regex.Replace(Row.Name, garbagelist, "", RegexOptions.IgnoreCase);


Final Output : 

Add Multicast Transformation in Data Flow task and Connect Script Task to it( We are using Multicast here just for testing). Add Data Viewer between Script Component and Multicast to see the results.

As we can see in the snapshot that all the garbage characters/words are removed with blank space by using script component and SC_Name column does not have any of those characters/words.



7 comments:

  1. I think SSIS and PostgreSQL are always the best when it comes to knowing about some very useful and complex IT problems.

    SSIS PostgreSql Write

    ReplyDelete
  2. A very awesome blog post. We are really grateful for your blog post. You will find a lot of approaches after visiting your post. Plumbers in Glendale CA

    ReplyDelete
  3. you have good information but some people tell me this website is also good what you think!!
    Plumber Fremont CA

    ReplyDelete
  4. HI Team,
    Good Work, it's really help me.
    Am using these symbols in variable "!|@|#|$|%|&|*"
    Have trying same but am getting few errors as shown in bellow

    [Script Component [29]] Error: System.ArgumentException: parsing "!|@|#|$|%|&|*" - Quantifier {x,y} following nothing.
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.ProcessInput(Int32 inputID, PipelineBuffer buffer)
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostProcessInput(IDTSManagedComponentWrapper100 wrapper, Int32 inputID, IDTSBuffer100 pDTSBuffer, IntPtr bufferWirePacket)

    ReplyDelete
  5. UP Board 9th Previous Question Paper 2023 UP 9th Previous Question Paper 2023 UP Board 9th class Model Paper 2023, UP board Model Question Paper 2023 Syllabus, UP 9th Model Question Paper 2023 UPMSP 9th class Sample Paper,UPMSP 9th Model Paper 2023, UP 9TH Model Question Paper 2023 Syllabus wise, UP 9th Model Question Paper 2023 UP Board 9th Previous Question Paper 2023, UP 9th Previous Question Paper 2023, UP 9th Model Question Paper 2023.

    ReplyDelete