Create External Tables in Azure Synapse Dedicated Pool for CSV & Parquet Files
📘 What is an External Table?
An External Table in Azure Synapse Dedicated SQL Pool allows you to query data stored outside the database — typically in Azure Data Lake Storage Gen2 — without importing it into the SQL pool itself. This enables cost-effective querying of large files like .csv
or .parquet
formats directly using T-SQL.
External tables are used in scenarios like raw zone exploration, staging, or ELT pipelines where data exists in a data lake but needs to be accessed for transformations or analytics.
📁 File Formats Supported
- CSV: Commonly used for flat files
- Parquet: Optimized, columnar format best for analytics
🔧 Pre-requisites
- Azure Synapse workspace with a Dedicated SQL Pool
- Data files (CSV/Parquet) stored in ADLS Gen2
- Database scoped credential to access the storage account
- External data source and file format definitions
🛠️ Step-by-Step Example for CSV
1. Create Database Scoped Credential
CREATE DATABASE SCOPED CREDENTIAL myadls_cred
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2024...';
2. Create External Data Source
CREATE EXTERNAL DATA SOURCE my_adls
WITH (
LOCATION = 'abfss://datalake@yourstorage.dfs.core.windows.net/',
CREDENTIAL = myadls_cred
);
3. Create External File Format for CSV
CREATE EXTERNAL FILE FORMAT csv_format
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '\"',
FIRST_ROW = 2
)
);
4. Create External Table
CREATE EXTERNAL TABLE dbo.ExternalSalesCSV
(
SaleID INT,
ProductName VARCHAR(100),
Quantity INT,
Price MONEY
)
WITH (
LOCATION = '/sales/csv/',
DATA_SOURCE = my_adls,
FILE_FORMAT = csv_format,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
🛠️ Step-by-Step Example for Parquet
1. Create External File Format for Parquet
CREATE EXTERNAL FILE FORMAT parquet_format
WITH (
FORMAT_TYPE = PARQUET
);
2. Create External Table
CREATE EXTERNAL TABLE dbo.ExternalProductsParquet
(
ProductID INT,
ProductName VARCHAR(100),
Category VARCHAR(50),
Price FLOAT
)
WITH (
LOCATION = '/products/parquet/',
DATA_SOURCE = my_adls,
FILE_FORMAT = parquet_format
);
📌 Best Practices
- Use Parquet format for large analytical queries to benefit from columnar compression.
- Keep external table definitions consistent with the structure of the underlying file.
- Use views to abstract external table logic for business users.
- Use the
REJECT_TYPE
clause to handle malformed rows gracefully.
📺 Watch the Video Tutorial
📚 Credit: Content created with the help of ChatGPT and Gemini.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.