05: Querying CSV, Parquet, and JSON Files in ADLS Gen2 with Serverless SQL
In Azure Synapse Analytics, you can query raw data files directly from Azure Data Lake Storage Gen2 using
Serverless SQL Pools. Below are practical examples to query CSV
, Parquet
,
and JSON
formats efficiently.
📁 Querying CSV Files
This example reads a CSV file with headers directly from ADLS Gen2:
SELECT *
FROM OPENROWSET(
BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/sample_users.csv',
FORMAT = 'CSV',
HEADER_ROW = TRUE,
PARSER_VERSION = '2.0'
) AS rows;
💡 Tip: Use SELECT
with specific columns instead of *
to minimize scanned data and cost.
📁 Querying Parquet Files
This query fetches data from a Parquet file — optimized for analytical workloads:
SELECT TOP 4 *
FROM OPENROWSET(
BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/titanic.parquet',
FORMAT = 'PARQUET'
) AS rows;
Reference: Sample Parquet file from Tablab
📁 Querying JSON Files (NDJSON - newline-delimited JSON)
This example demonstrates querying raw JSON documents by tricking Synapse into reading the entire line as a single column:
SELECT TOP 100
jsonContent,
JSON_VALUE(jsonContent, '$.customer_id') AS customer_id,
JSON_VALUE(jsonContent, '$.name') AS customer_name
FROM
OPENROWSET(
BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/sample_customers.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR = '0x0b',
ROWTERMINATOR = '0x0b'
)
WITH (
jsonContent VARCHAR(MAX)
) AS [result];
🛠 Explanation of Parameters:
Parameter | Normal Meaning | Usage Here |
---|---|---|
FIELDQUOTE |
Character used to wrap a field (e.g. ") | Set to 0x0b so nothing is quoted |
FIELDTERMINATOR |
Character between fields (e.g. ,) | Set to 0x0b to avoid splitting |
ROWTERMINATOR |
End of line character (e.g. \n) | Set to 0x0b to treat each line as one row |
📺 Watch the Full Tutorial
Visual walkthrough available in the video below:
This blog post was created with assistance from ChatGPT and Gemini AI to ensure technical accuracy and clarity.
No comments:
Post a Comment