Welcome To TechBrothersIT: How to Find Duplicate Records in All the Tables in SQL Server Database

Scenario: Download Script

You are working on this big project called Data Analysis. you need to find all the duplicate records in all the tables in a database. The database contains hundreds of tables. It is taking forever to write query for each table and then run it and find the duplicate records. You want something that can run in one shot and provide you all the list of tables, columns, duplicate values and duplicate count.

Solution:

We are going to use Cursor so we can loop through tables in our database and use dynamic sql to build our query to find duplicate record. We will run this query for each table and then insert the records in #Results Temp Table.

At the end you will have one nice and neat table with Database Name, Schema Name,Table Name, Column List, Duplicate Value,Total Table Row count and Duplicate Count.

Few things to remember before you go ahead and run this script on Production. As this script is going to generate queries on fly and run them one after one. If your database is big and has tables with a lot or records, it can take long time to run. Find the best when there is very minimum activity on server or try to schedule on weekend when have very limited processes running on server.

USE yourdbname

go

DECLARE @SchemaName VARCHAR(100)
DECLARE @TableName VARCHAR(100)
DECLARE @DatabaseName VARCHAR(100)

--Create Temp Table to Save Results
IF Object_id('tempdb..#Results') IS NOT NULL
  DROP TABLE #results

CREATE TABLE #results
  (
     databasename       VARCHAR(100),
     schemaname         VARCHAR(100),
     tablename          VARCHAR(100),
     columnlist         VARCHAR(max),
     duplicatevalue     VARCHAR(max),
     totaltablerowcount INT,
     duplicaterowcnt    INT
  )

DECLARE cur CURSOR FOR
  SELECT table_catalog,
         table_schema,
         table_name
  FROM   information_schema.tables
  WHERE  table_type = 'BASE TABLE'

OPEN cur

FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName

WHILE @@FETCH_STATUS = 0
  BEGIN
      --Get List of the Columns from Table without Identity Column
      DECLARE @ColumnList NVARCHAR(max)=NULL

      SELECT @ColumnList = COALESCE(@ColumnList + '],[', '') + c.NAME
      FROM   sys.columns c
             INNER JOIN sys.tables t
                     ON c.object_id = t.object_id
      WHERE  Object_name(c.object_id) = @TableName
             AND Schema_name(schema_id) = @SchemaName
             AND is_identity = 0

      SET @ColumnList='[' + @ColumnList + ']'

      --Print @ColumnList
      DECLARE @ColumnListConcat VARCHAR(max)=NULL

      SET @ColumnListConcat=Replace(Replace(Replace(Replace(@ColumnList, '[',
                                                    'ISNULL(Cast(['), ']',
                                            '] AS VARCHAR(MAX)),''NULL'')'),
                                            ',ISNULL', '+ISNULL'), '+',
                            '+'',''+')

      --Create Dynamic Query for Finding duplicate Records
      DECLARE @DuplicateSQL NVARCHAR(max)=NULL

      SET @DuplicateSQL= ';With CTE as   (select  '''
                         + @DatabaseName + ''' AS DBName,' + ''''
                         + @SchemaName + ''' AS SchemaName,' + ''''
                         + @TableName + ''' AS TableName,' + ''''
                         + @ColumnList + ''' AS ColumnList,'
                         + @ColumnListConcat
                         + ' AS ColumnConcat,    (Select count(*) from [' + @SchemaName
                         + '].[' + @TableName
                         + '] With (Nolock))            AS TotalTableRowCount    ,RN = row_number()             over(PARTITION BY '
                         + @ColumnList + '  order by ' + @ColumnList
                         + ')             from [' + @SchemaName + '].['
                         + @TableName + '] ) Select * From CTE WHERE RN>1'

      PRINT @DuplicateSQL

      INSERT INTO #results
      EXEC(@DuplicateSQL)

      FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName
  END

CLOSE cur

DEALLOCATE cur

SELECT *
FROM   #results
--drop table #Results

I ran this script on my database, My Database is small so it took few seconds to complete. I got below results for my tables.It found all the duplicate records in all the table if there were any in SQL Server Database.

How to Find duplicate Records in all the tables in SQL server Database - SQL Server / TSQL Tutorial

Check out our other posts on Data Analysis / Data Validation / Data Cleansing

Welcome To TechBrothersIT

Label

How to Find Duplicate Records in All the Tables in SQL Server Database - SQL Server / TSQL Tutorial

Scenario: Download Script

Solution:

No comments:

Post a Comment