How to Find Duplicate Records in All the Tables in SQL Server Database - SQL Server / TSQL Tutorial

Scenario: Download Script

You are working on this big project called Data Analysis. you need to find all the duplicate records in all the tables in a database. The database contains hundreds of tables. It is taking forever to write query for each table and then run it and find the duplicate records. You want something that can run in one shot and provide you all the list of tables, columns, duplicate values and duplicate count.


We are going to use Cursor so we can loop through tables in our database and use dynamic sql to build our query to find duplicate record. We will run this query for each table and then insert the records in #Results Temp Table. 

At the end you will have one nice and neat table with Database Name, Schema Name,Table Name, Column List, Duplicate Value,Total Table Row count and Duplicate Count.

Few things to remember before you go ahead and run this script on Production. As this script is going to generate queries on fly and run them one after one. If your database is big and has tables with a lot or records, it can take long time to run. Find the best when there is very minimum activity on server or try to schedule on weekend when have very limited processes running on server. 

USE yourdbname 


DECLARE @SchemaName VARCHAR(100) 
DECLARE @TableName VARCHAR(100) 
DECLARE @DatabaseName VARCHAR(100) 

--Create Temp Table to Save Results 
IF Object_id('tempdb..#Results') IS NOT NULL 
  DROP TABLE #results 

CREATE TABLE #results 
     databasename       VARCHAR(100), 
     schemaname         VARCHAR(100), 
     tablename          VARCHAR(100), 
     columnlist         VARCHAR(max), 
     duplicatevalue     VARCHAR(max), 
     totaltablerowcount INT, 
     duplicaterowcnt    INT 

  SELECT table_catalog, 
  FROM   information_schema.tables 
  WHERE  table_type = 'BASE TABLE' 

OPEN cur 

FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName 

      --Get List of the Columns from Table without Identity Column 
      DECLARE @ColumnList NVARCHAR(max)=NULL 

      SELECT @ColumnList = COALESCE(@ColumnList + '],[', '') + c.NAME 
      FROM   sys.columns c 
             INNER JOIN sys.tables t 
                     ON c.object_id = t.object_id 
      WHERE  Object_name(c.object_id) = @TableName 
             AND Schema_name(schema_id) = @SchemaName 
             AND is_identity = 0 

      SET @ColumnList='[' + @ColumnList + ']' 

      --Print @ColumnList 
      DECLARE @ColumnListConcat VARCHAR(max)=NULL 

      SET @ColumnListConcat=Replace(Replace(Replace(Replace(@ColumnList, '[', 
                                                    'ISNULL(Cast(['), ']', 
                                            '] AS VARCHAR(MAX)),''NULL'')'), 
                                            ',ISNULL', '+ISNULL'), '+', 

      --Create Dynamic Query for Finding duplicate Records 
      DECLARE @DuplicateSQL NVARCHAR(max)=NULL 

      SET @DuplicateSQL= ';With CTE as   (select  ''' 
                         + @DatabaseName + ''' AS DBName,' + '''' 
                         + @SchemaName + ''' AS SchemaName,' + '''' 
                         + @TableName + ''' AS TableName,' + '''' 
                         + @ColumnList + ''' AS ColumnList,' 
                         + @ColumnListConcat 
                         + ' AS ColumnConcat,    (Select count(*) from [' + @SchemaName 
                         + '].[' + @TableName 
                         + '] With (Nolock))             AS TotalTableRowCount    ,RN = row_number()             over(PARTITION BY ' 
                         + @ColumnList + '  order by ' + @ColumnList 
                         + ')             from [' + @SchemaName + '].[' 
                         + @TableName + ']  ) Select * From CTE WHERE RN>1' 

      PRINT @DuplicateSQL 

      INSERT INTO #results 

      FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName 

CLOSE cur 


FROM   #results 
--drop table #Results 

I ran this script on my database, My Database is small so it took few seconds to complete. I got below results for my tables.It found all the duplicate records in all the table if there were any in SQL Server Database.
How to Find duplicate Records in all the tables in SQL server Database - SQL Server / TSQL Tutorial

Check out our other posts on Data Analysis / Data Validation / Data Cleansing

No comments:

Post a Comment