CTAS vs SELECT INTO in Azure Synapse Analytics | Create Table in Dedicated SQL Pool
📘 Overview
In Azure Synapse Analytics, you have two common options for creating new tables based on the results of a query:
- CTAS – Create Table As Select
- SELECT INTO – Select data into a new table
While both approaches allow you to create a new table from existing data, there are important differences in features, performance, and support within Synapse Dedicated SQL Pools.
⚖️ Comparison Table: CTAS vs SELECT INTO
Feature | CTAS | SELECT INTO |
---|---|---|
Supported in Dedicated SQL Pool | ✅ Yes | ❌ No |
Control Over Distribution Method | ✅ Yes | ❌ No |
Control Over Indexes | ✅ Yes (e.g., columnstore index) | ❌ No |
Used in Serverless SQL Pool | ❌ No | ✅ Yes |
Syntax Simplicity | Moderate | Very Simple |
✅ Example: CTAS in Dedicated SQL Pool
CREATE TABLE dbo.TotalSalesByProduct
WITH (
DISTRIBUTION = HASH(ProductID),
CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT ProductID, SUM(Quantity) AS TotalQuantity
FROM dbo.SalesFact
GROUP BY ProductID;
This will create a distributed and indexed table optimized for analytics.
✅ Example: SELECT INTO in Serverless SQL Pool
SELECT ProductID, SUM(Quantity) AS TotalQuantity
INTO dbo.TotalSalesByProduct
FROM OPENROWSET(
BULK 'https://storageaccount.dfs.core.windows.net/container/sales.parquet',
FORMAT = 'PARQUET'
) AS sales
GROUP BY ProductID;
This method is easier and works well in exploratory scenarios in Serverless SQL Pools, but doesn't allow control over performance features.
📌 Key Takeaways
- Use CTAS in Dedicated SQL Pool for production-grade performance.
- Use SELECT INTO in Serverless SQL Pool for lightweight operations or temporary results.
- CTAS gives you full control over distribution, storage format, and indexing.
📺 Watch the Video Tutorial
📚 Credit: Content created with the help of ChatGPT and Gemini.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.