Welcome To TechBrothersIT: How to Find & Delete Duplicate Records on All Tables in SQL Server Database

Scenario: Download Script

You are working as SQL Server developer / TSQL developer for Credit Card Company. You are working on one of the big project that requires you to find the duplicate records in all the tables in SQL Server Database and delete them. You started with writing query and soon realize that there are hundreds of tables and it will take you forever to write query for each of the table. You want to find the script that can find duplicate records for you and then log the information and delete them. If that is the case you are at right place!

Solution:

As we need to find the duplicate records in all the tables, we need loop. In our case we will be using Cursor. We will loop through list of schema and table and then prepare our find duplicate query and run it and log the information in temp table #Result.

By doing that we will be able to provide details which records we have deleted.

In second part, we will prepare our query to delete those duplicate records. We are using Common Table Expressions to find duplicate records.

The log table will return us DataBase Name, Schema Name, Table Name, Column List, Duplicate Value, Total Row Count for Table and Duplicate Row Count.

Few things to consider before you run this script in production

1) As this script is going to run on all the tables in a database, this can be expensive process. Make sure you test in lower environments such as QA,UAT before run in Production.

2) As delete operation is involved, you want to make sure there is very less activity on database. Find a time such as weekend or off hours for this operation.

USE yourdbname

go

DECLARE @SchemaName VARCHAR(100)
DECLARE @TableName VARCHAR(100)
DECLARE @DatabaseName VARCHAR(100)

--Create Temp Table to Save Results
IF Object_id('tempdb..#Results') IS NOT NULL
  DROP TABLE #results

CREATE TABLE #results
  (
     databasename       VARCHAR(100),
     schemaname         VARCHAR(100),
     tablename          VARCHAR(100),
     columnlist         VARCHAR(max),
     duplicatevalue     VARCHAR(max),
     totaltablerowcount INT,
     duplicaterowcnt    INT
  )

DECLARE cur CURSOR FOR
  SELECT table_catalog,
         table_schema,
         table_name
  FROM   information_schema.tables
  WHERE  table_type = 'BASE TABLE'

OPEN cur

FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName

WHILE @@FETCH_STATUS = 0
  BEGIN
      --Get List of the Columns from Table without Identity Column
      DECLARE @ColumnList NVARCHAR(max)=NULL

      SELECT @ColumnList = COALESCE(@ColumnList + '],[', '') + c.NAME
      FROM   sys.columns c
             INNER JOIN sys.tables t
                     ON c.object_id = t.object_id
      WHERE  Object_name(c.object_id) = @TableName
             AND Schema_name(schema_id) = @SchemaName
             AND is_identity = 0

      SET @ColumnList='[' + @ColumnList + ']'

      --Print @ColumnList
      DECLARE @ColumnListConcat VARCHAR(max)=NULL

      SET @ColumnListConcat=Replace(Replace(Replace(Replace(@ColumnList, '[',
                                                    'ISNULL(Cast(['), ']',
                                            '] AS VARCHAR(MAX)),''NULL'')'),
                                            ',ISNULL', '+ISNULL'), '+',
                            '+'',''+')

      --Create Dynamic Query for Finding duplicate Records
      DECLARE @DuplicateSQL NVARCHAR(max)=NULL
      DECLARE @DeleteDuplicateSQL NVARCHAR(max)=NULL

      SET @DuplicateSQL= ';With CTE as   (select  '''
                         + @DatabaseName + ''' AS DBName,' + ''''
                         + @SchemaName + ''' AS SchemaName,' + ''''
                         + @TableName + ''' AS TableName,' + ''''
                         + @ColumnList + ''' AS ColumnList,'
                         + @ColumnListConcat
                         + ' AS ColumnConcat,    (Select count(*) from ['
                         + @SchemaName + '].[' + @TableName
                         + '] With (Nolock))             AS TotalTableRowCount    ,RN = row_number()     over(PARTITION BY ' + @ColumnList
                         + '  order by ' + @ColumnList
                         + ')             from [' + @SchemaName + '].['
                         + @TableName
                         + ']  ) Select * From CTE WHERE RN>1'
      -- PRINT @DuplicateSQL
      --Prepate Delete Duplicate Records Query
      SET @DeleteDuplicateSQL=
      ';With CTE as   (select  RN = row_number()             over(PARTITION BY '
      + @ColumnList + '  order by ' + @ColumnList
      + ')             from [' + @SchemaName + '].['
      + @TableName
      + ']  ) Delete from  CTE WHERE RN>1'

      --Log Duplicate Record information in Temp Table
      INSERT INTO #results
      EXEC(@DuplicateSQL)

      --Run Dynamic SQL to Delete Duplicate
      EXEC(@DeleteDuplicateSQL)

      FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName
  END

CLOSE cur

DEALLOCATE cur

SELECT *
FROM   #results
--drop table #Results

I have execute above script on my database after inserting few duplicate records in my table and here is my output.

How to find and Delete duplicate records in all the Tables in SQL server Database - SQL Server / T-SQL Tutorial

Check out our other posts on Data Analysis / Data Validation / Data Cleansing

Welcome To TechBrothersIT

Label

How to Find & Delete Duplicate Records on All Tables in SQL Server Database - SQL Server / TSQL Tutorial

Scenario: Download Script

Solution:

No comments:

Post a Comment