Querying CSV, Parquet, and JSON Files in ADLS Gen2 with Serverless SQL | Azure Synapse Analytics Tutorial

Querying CSV, Parquet, and JSON Files in ADLS Gen2 with Serverless SQL

05: Querying CSV, Parquet, and JSON Files in ADLS Gen2 with Serverless SQL

In Azure Synapse Analytics, you can query raw data files directly from Azure Data Lake Storage Gen2 using Serverless SQL Pools. Below are practical examples to query CSV, Parquet, and JSON formats efficiently.

📁 Querying CSV Files

This example reads a CSV file with headers directly from ADLS Gen2:

SELECT *
FROM OPENROWSET(
    BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/sample_users.csv',
    FORMAT = 'CSV',
    HEADER_ROW = TRUE,
    PARSER_VERSION = '2.0'
) AS rows;

💡 Tip: Use SELECT with specific columns instead of * to minimize scanned data and cost.

📁 Querying Parquet Files

This query fetches data from a Parquet file — optimized for analytical workloads:

SELECT TOP 4 *
FROM OPENROWSET(
    BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/titanic.parquet',
    FORMAT = 'PARQUET'
) AS rows;

Reference: Sample Parquet file from Tablab

📁 Querying JSON Files (NDJSON - newline-delimited JSON)

This example demonstrates querying raw JSON documents by tricking Synapse into reading the entire line as a single column:

SELECT TOP 100
    jsonContent,
    JSON_VALUE(jsonContent, '$.customer_id') AS customer_id,
    JSON_VALUE(jsonContent, '$.name') AS customer_name
FROM
    OPENROWSET(
        BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/sample_customers.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR = '0x0b',
        ROWTERMINATOR = '0x0b'
    )
    WITH (
        jsonContent VARCHAR(MAX)
    ) AS [result];

🛠 Explanation of Parameters:

Parameter Normal Meaning Usage Here
FIELDQUOTE Character used to wrap a field (e.g. ") Set to 0x0b so nothing is quoted
FIELDTERMINATOR Character between fields (e.g. ,) Set to 0x0b to avoid splitting
ROWTERMINATOR End of line character (e.g. \n) Set to 0x0b to treat each line as one row

📺 Watch the Full Tutorial

Visual walkthrough available in the video below:



This blog post was created with assistance from ChatGPT and Gemini AI to ensure technical accuracy and clarity.

No comments:

Post a Comment