Scenario: Download Script
You are working as SQL Server developer / TSQL developer for Credit Card Company. You are working on one of the big project that requires you to find the duplicate records in all the tables in SQL Server Database and delete them. You started with writing query and soon realize that there are hundreds of tables and it will take you forever to write query for each of the table. You want to find the script that can find duplicate records for you and then log the information and delete them. If that is the case you are at right place!
Solution:
As we need to find the duplicate records in all the tables, we need loop. In our case we will be using Cursor. We will loop through list of schema and table and then prepare our find duplicate query and run it and log the information in temp table #Result.
By doing that we will be able to provide details which records we have deleted.
In second part, we will prepare our query to delete those duplicate records. We are using Common Table Expressions to find duplicate records.
The log table will return us DataBase Name, Schema Name, Table Name, Column List, Duplicate Value, Total Row Count for Table and Duplicate Row Count.
Few things to consider before you run this script in production
1) As this script is going to run on all the tables in a database, this can be expensive process. Make sure you test in lower environments such as QA,UAT before run in Production.
2) As delete operation is involved, you want to make sure there is very less activity on database. Find a time such as weekend or off hours for this operation.
USE yourdbname
go
DECLARE @SchemaName VARCHAR(100)
DECLARE @TableName VARCHAR(100)
DECLARE @DatabaseName VARCHAR(100)
--Create Temp Table to Save Results
IF Object_id('tempdb..#Results') IS NOT NULL
DROP TABLE #results
CREATE TABLE #results
(
databasename VARCHAR(100),
schemaname VARCHAR(100),
tablename VARCHAR(100),
columnlist VARCHAR(max),
duplicatevalue VARCHAR(max),
totaltablerowcount INT,
duplicaterowcnt INT
)
DECLARE cur CURSOR FOR
SELECT table_catalog,
table_schema,
table_name
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'
OPEN cur
FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
--Get List of the Columns from Table without Identity Column
DECLARE @ColumnList NVARCHAR(max)=NULL
SELECT @ColumnList = COALESCE(@ColumnList + '],[', '') + c.NAME
FROM sys.columns c
INNER JOIN sys.tables t
ON c.object_id = t.object_id
WHERE Object_name(c.object_id) = @TableName
AND Schema_name(schema_id) = @SchemaName
AND is_identity = 0
SET @ColumnList='[' + @ColumnList + ']'
--Print @ColumnList
DECLARE @ColumnListConcat VARCHAR(max)=NULL
SET @ColumnListConcat=Replace(Replace(Replace(Replace(@ColumnList, '[',
'ISNULL(Cast(['), ']',
'] AS VARCHAR(MAX)),''NULL'')'),
',ISNULL', '+ISNULL'), '+',
'+'',''+')
--Create Dynamic Query for Finding duplicate Records
DECLARE @DuplicateSQL NVARCHAR(max)=NULL
DECLARE @DeleteDuplicateSQL NVARCHAR(max)=NULL
SET @DuplicateSQL= ';With CTE as (select '''
+ @DatabaseName + ''' AS DBName,' + ''''
+ @SchemaName + ''' AS SchemaName,' + ''''
+ @TableName + ''' AS TableName,' + ''''
+ @ColumnList + ''' AS ColumnList,'
+ @ColumnListConcat
+ ' AS ColumnConcat, (Select count(*) from ['
+ @SchemaName + '].[' + @TableName
+ '] With (Nolock)) AS TotalTableRowCount ,RN = row_number() over(PARTITION BY ' + @ColumnList
+ ' order by ' + @ColumnList
+ ') from [' + @SchemaName + '].['
+ @TableName
+ '] ) Select * From CTE WHERE RN>1'
-- PRINT @DuplicateSQL
--Prepate Delete Duplicate Records Query
SET @DeleteDuplicateSQL=
';With CTE as (select RN = row_number() over(PARTITION BY '
+ @ColumnList + ' order by ' + @ColumnList
+ ') from [' + @SchemaName + '].['
+ @TableName
+ '] ) Delete from CTE WHERE RN>1'
--Log Duplicate Record information in Temp Table
INSERT INTO #results
EXEC(@DuplicateSQL)
--Run Dynamic SQL to Delete Duplicate
EXEC(@DeleteDuplicateSQL)
FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName
END
CLOSE cur
DEALLOCATE cur
SELECT *
FROM #results
--drop table #Results
go
DECLARE @SchemaName VARCHAR(100)
DECLARE @TableName VARCHAR(100)
DECLARE @DatabaseName VARCHAR(100)
--Create Temp Table to Save Results
IF Object_id('tempdb..#Results') IS NOT NULL
DROP TABLE #results
CREATE TABLE #results
(
databasename VARCHAR(100),
schemaname VARCHAR(100),
tablename VARCHAR(100),
columnlist VARCHAR(max),
duplicatevalue VARCHAR(max),
totaltablerowcount INT,
duplicaterowcnt INT
)
DECLARE cur CURSOR FOR
SELECT table_catalog,
table_schema,
table_name
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'
OPEN cur
FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
--Get List of the Columns from Table without Identity Column
DECLARE @ColumnList NVARCHAR(max)=NULL
SELECT @ColumnList = COALESCE(@ColumnList + '],[', '') + c.NAME
FROM sys.columns c
INNER JOIN sys.tables t
ON c.object_id = t.object_id
WHERE Object_name(c.object_id) = @TableName
AND Schema_name(schema_id) = @SchemaName
AND is_identity = 0
SET @ColumnList='[' + @ColumnList + ']'
--Print @ColumnList
DECLARE @ColumnListConcat VARCHAR(max)=NULL
SET @ColumnListConcat=Replace(Replace(Replace(Replace(@ColumnList, '[',
'ISNULL(Cast(['), ']',
'] AS VARCHAR(MAX)),''NULL'')'),
',ISNULL', '+ISNULL'), '+',
'+'',''+')
--Create Dynamic Query for Finding duplicate Records
DECLARE @DuplicateSQL NVARCHAR(max)=NULL
DECLARE @DeleteDuplicateSQL NVARCHAR(max)=NULL
SET @DuplicateSQL= ';With CTE as (select '''
+ @DatabaseName + ''' AS DBName,' + ''''
+ @SchemaName + ''' AS SchemaName,' + ''''
+ @TableName + ''' AS TableName,' + ''''
+ @ColumnList + ''' AS ColumnList,'
+ @ColumnListConcat
+ ' AS ColumnConcat, (Select count(*) from ['
+ @SchemaName + '].[' + @TableName
+ '] With (Nolock)) AS TotalTableRowCount ,RN = row_number() over(PARTITION BY ' + @ColumnList
+ ' order by ' + @ColumnList
+ ') from [' + @SchemaName + '].['
+ @TableName
+ '] ) Select * From CTE WHERE RN>1'
-- PRINT @DuplicateSQL
--Prepate Delete Duplicate Records Query
SET @DeleteDuplicateSQL=
';With CTE as (select RN = row_number() over(PARTITION BY '
+ @ColumnList + ' order by ' + @ColumnList
+ ') from [' + @SchemaName + '].['
+ @TableName
+ '] ) Delete from CTE WHERE RN>1'
--Log Duplicate Record information in Temp Table
INSERT INTO #results
EXEC(@DuplicateSQL)
--Run Dynamic SQL to Delete Duplicate
EXEC(@DeleteDuplicateSQL)
FETCH next FROM cur INTO @DatabaseName, @SchemaName, @TableName
END
CLOSE cur
DEALLOCATE cur
SELECT *
FROM #results
--drop table #Results
I have execute above script on my database after inserting few duplicate records in my table and here is my output.
How to find and Delete duplicate records in all the Tables in SQL server Database - SQL Server / T-SQL Tutorial
Check out our other posts on Data Analysis / Data Validation / Data Cleansing
- How to Search in all Columns for all tables in a database for Date Value in SQL Server
- How to Find Percentage of Null Values in every Column of all the Tables in SQL Server Database
- How to Find Percentage of Empty or Blank Values in every Column of all the Tables in SQL Server Database
- How to Get Row Count Of All The Tables In SQL Server Database
- How to search for a String in all Columns in all tables in SQL Server Database
- How to find and Replace Text in all the columns in all the tables in SQL server Database
- How to find values with Leading or Trailing Spaces in SQL Server Table
- How to Trim Leading and Trailing spaces for all the values in all the Columns in all the Tables in a SQL Server Database
- How to Find Duplicate Records in All the Tables in SQL Server Database
- How to Generate Select all Columns with or Without Top X Rows From All Tables Query Dynamically in SQL Server
- Concatenate SQL Server Columns for each Table Query Dynamically in SQL Server
- How to List all Missing Identity Values for all Tables in SQL Server Database
- How to Find Max and Min Values for all the columns for all the tables in SQL Server Database
Do you have video for this How to Find & Delete Duplicate Records on All Tables in SQL Server Database - SQL Server / TSQL Tutorial
ReplyDelete