How to Use CTAS (Create Table As Select) in Azure Synapse Analytics | Azure Synapse Analytics Tutorial

How to Use CTAS in Azure Synapse Analytics to Create New Tables

How to Use CTAS (Create Table As) in Azure Synapse Analytics

📘 What is CTAS?

CTAS stands for Create Table As Select. It’s a T-SQL statement used in Azure Synapse Dedicated SQL Pools to create a new table by selecting data from one or more existing tables. CTAS is highly efficient for:

  • Creating aggregated or filtered datasets
  • Staging intermediate results
  • Transforming or reshaping data

CTAS leverages the Massively Parallel Processing (MPP) engine of Synapse to create new distributed tables efficiently.

✅ Syntax

CREATE TABLE [schema].[new_table]
WITH (DISTRIBUTION = HASH(column) | ROUND_ROBIN | REPLICATE)
AS
SELECT ... FROM existing_table;

🛠️ Real Examples

🔹 Scenario 1: Create Table with Subset of Columns

CREATE TABLE dbo.DimCustomerName
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT customer_id, fname, lastname
FROM dbo.Customers_CSV;

🔹 Scenario 2: Create Table with Aggregated Data

CREATE TABLE dbo.TotalSalesByProduct
WITH (DISTRIBUTION = HASH(ProductID))
AS
SELECT ProductID, SUM(Quantity) AS TotalQuantitySold
FROM dbo.SalesFact_CSV
GROUP BY ProductID;

🔹 Scenario 3: Create Table with Filtered Data

CREATE TABLE dbo.HighValueCustomers
WITH (DISTRIBUTION = REPLICATE)
AS
SELECT customer_id, fname, lastname
FROM dbo.Customers_CSV
WHERE customer_id IN (1, 3);

🔹 Scenario 4: Rename Columns and Cast Data Types

CREATE TABLE dbo.ProductPricing
WITH (DISTRIBUTION = HASH(ProductID_New))
AS
SELECT ProductID AS ProductID_New,
       ProductName AS Name,
       Category,
       CAST(Price AS DECIMAL(12, 4)) AS Price_Adjusted
FROM dbo.ProductInfo_Parquet;

🔹 Scenario 5: Create Table from JOIN

CREATE TABLE dbo.CustomerSales
WITH (DISTRIBUTION = HASH(customer_id))
AS
SELECT c.customer_id, c.fname, c.lastname,
       s.SaleID, s.ProductID, s.Quantity, s.SaleDate
FROM dbo.Customers_CSV c
JOIN dbo.SalesFact_CSV s
  ON c.customer_id = s.CustomerID;

🔹 Scenario 6: Use a Different Schema

CREATE TABLE staging.StagedProducts
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT ProductID, ProductName, Category, Price
FROM dbo.ProductInfo_Parquet
WHERE Category = 'Electronics';

🔹 Scenario 7: Create Table with New Data Types

CREATE TABLE dbo.SalesFactBigIntID
WITH (DISTRIBUTION = HASH(ProductID),
      CLUSTERED COLUMNSTORE INDEX)
AS
SELECT CAST(SaleID AS BIGINT) AS SaleID_BigInt,
       ProductID, Quantity, SaleDate, CustomerID
FROM dbo.SalesFact_CSV;

📌 CTAS Best Practices

  • Always choose the right DISTRIBUTION method (HASH for large fact tables, REPLICATE for small dimensions).
  • Use CTAS for performance-optimized reporting layers.
  • Drop and recreate staging tables during ELT to minimize fragmentation.
  • Use appropriate data types to reduce storage size.

📺 Watch the Video Tutorial

📚 Credit: Content created with the help of ChatGPT and Gemini.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.