Create External Tables in Azure Synapse Dedicated Pool for CSV & Parquet Files | Azure Synapse Analytics Tutorial

Create External Tables in Azure Synapse Dedicated Pool for CSV & Parquet Files

Create External Tables in Azure Synapse Dedicated Pool for CSV & Parquet Files

📘 What is an External Table?

An External Table in Azure Synapse Dedicated SQL Pool allows you to query data stored outside the database — typically in Azure Data Lake Storage Gen2 — without importing it into the SQL pool itself. This enables cost-effective querying of large files like .csv or .parquet formats directly using T-SQL.

External tables are used in scenarios like raw zone exploration, staging, or ELT pipelines where data exists in a data lake but needs to be accessed for transformations or analytics.

📁 File Formats Supported

  • CSV: Commonly used for flat files
  • Parquet: Optimized, columnar format best for analytics

🔧 Pre-requisites

  • Azure Synapse workspace with a Dedicated SQL Pool
  • Data files (CSV/Parquet) stored in ADLS Gen2
  • Database scoped credential to access the storage account
  • External data source and file format definitions

🛠️ Step-by-Step Example for CSV

1. Create Database Scoped Credential

CREATE DATABASE SCOPED CREDENTIAL myadls_cred
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2024...';

2. Create External Data Source

CREATE EXTERNAL DATA SOURCE my_adls
WITH (
    LOCATION = 'abfss://datalake@yourstorage.dfs.core.windows.net/',
    CREDENTIAL = myadls_cred
);

3. Create External File Format for CSV

CREATE EXTERNAL FILE FORMAT csv_format
WITH (
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS (
        FIELD_TERMINATOR = ',',
        STRING_DELIMITER = '\"',
        FIRST_ROW = 2
    )
);

4. Create External Table

CREATE EXTERNAL TABLE dbo.ExternalSalesCSV
(
    SaleID INT,
    ProductName VARCHAR(100),
    Quantity INT,
    Price MONEY
)
WITH (
    LOCATION = '/sales/csv/',
    DATA_SOURCE = my_adls,
    FILE_FORMAT = csv_format,
    REJECT_TYPE = VALUE,
    REJECT_VALUE = 0
);

🛠️ Step-by-Step Example for Parquet

1. Create External File Format for Parquet

CREATE EXTERNAL FILE FORMAT parquet_format
WITH (
    FORMAT_TYPE = PARQUET
);

2. Create External Table

CREATE EXTERNAL TABLE dbo.ExternalProductsParquet
(
    ProductID INT,
    ProductName VARCHAR(100),
    Category VARCHAR(50),
    Price FLOAT
)
WITH (
    LOCATION = '/products/parquet/',
    DATA_SOURCE = my_adls,
    FILE_FORMAT = parquet_format
);

📌 Best Practices

  • Use Parquet format for large analytical queries to benefit from columnar compression.
  • Keep external table definitions consistent with the structure of the underlying file.
  • Use views to abstract external table logic for business users.
  • Use the REJECT_TYPE clause to handle malformed rows gracefully.

📺 Watch the Video Tutorial

📚 Credit: Content created with the help of ChatGPT and Gemini.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.