How to Use CTAS (Create Table As) in Azure Synapse Analytics
📘 What is CTAS?
CTAS stands for Create Table As Select. It’s a T-SQL statement used in Azure Synapse Dedicated SQL Pools to create a new table by selecting data from one or more existing tables. CTAS is highly efficient for:
- Creating aggregated or filtered datasets
- Staging intermediate results
- Transforming or reshaping data
CTAS leverages the Massively Parallel Processing (MPP) engine of Synapse to create new distributed tables efficiently.
✅ Syntax
CREATE TABLE [schema].[new_table]
WITH (DISTRIBUTION = HASH(column) | ROUND_ROBIN | REPLICATE)
AS
SELECT ... FROM existing_table;
🛠️ Real Examples
🔹 Scenario 1: Create Table with Subset of Columns
CREATE TABLE dbo.DimCustomerName
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT customer_id, fname, lastname
FROM dbo.Customers_CSV;
🔹 Scenario 2: Create Table with Aggregated Data
CREATE TABLE dbo.TotalSalesByProduct
WITH (DISTRIBUTION = HASH(ProductID))
AS
SELECT ProductID, SUM(Quantity) AS TotalQuantitySold
FROM dbo.SalesFact_CSV
GROUP BY ProductID;
🔹 Scenario 3: Create Table with Filtered Data
CREATE TABLE dbo.HighValueCustomers
WITH (DISTRIBUTION = REPLICATE)
AS
SELECT customer_id, fname, lastname
FROM dbo.Customers_CSV
WHERE customer_id IN (1, 3);
🔹 Scenario 4: Rename Columns and Cast Data Types
CREATE TABLE dbo.ProductPricing
WITH (DISTRIBUTION = HASH(ProductID_New))
AS
SELECT ProductID AS ProductID_New,
ProductName AS Name,
Category,
CAST(Price AS DECIMAL(12, 4)) AS Price_Adjusted
FROM dbo.ProductInfo_Parquet;
🔹 Scenario 5: Create Table from JOIN
CREATE TABLE dbo.CustomerSales
WITH (DISTRIBUTION = HASH(customer_id))
AS
SELECT c.customer_id, c.fname, c.lastname,
s.SaleID, s.ProductID, s.Quantity, s.SaleDate
FROM dbo.Customers_CSV c
JOIN dbo.SalesFact_CSV s
ON c.customer_id = s.CustomerID;
🔹 Scenario 6: Use a Different Schema
CREATE TABLE staging.StagedProducts
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT ProductID, ProductName, Category, Price
FROM dbo.ProductInfo_Parquet
WHERE Category = 'Electronics';
🔹 Scenario 7: Create Table with New Data Types
CREATE TABLE dbo.SalesFactBigIntID
WITH (DISTRIBUTION = HASH(ProductID),
CLUSTERED COLUMNSTORE INDEX)
AS
SELECT CAST(SaleID AS BIGINT) AS SaleID_BigInt,
ProductID, Quantity, SaleDate, CustomerID
FROM dbo.SalesFact_CSV;
📌 CTAS Best Practices
- Always choose the right
DISTRIBUTION
method (HASH for large fact tables, REPLICATE for small dimensions). - Use CTAS for performance-optimized reporting layers.
- Drop and recreate staging tables during ELT to minimize fragmentation.
- Use appropriate data types to reduce storage size.
📺 Watch the Video Tutorial
📚 Credit: Content created with the help of ChatGPT and Gemini.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.