💡 Azure Synapse Analytics Demo: OPENROWSET with Schema
Author: Aamir Shahzad
Description: Demonstrating how to explicitly define schema using the WITH
clause in OPENROWSET
for CSV, Parquet, and JSON files using Serverless SQL Pool in Azure Synapse.
🔹 Example 1: Read Specific Columns from a CSV File
When reading CSV files, you can define schema using column positions. This avoids loading unnecessary columns and improves query performance.
SELECT TOP 10 *
FROM OPENROWSET(
BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/Europe/customer_20250313_194510_EMEA.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIRSTROW = 2
) WITH (
customer_id INT 1, -- Customer ID
FName VARCHAR(50) 2 , -- First Name
EmployeeSalary VARCHAR(50) 4 -- Salary
) AS tbl;
Note: If ordinal positions are not provided, Synapse assumes left-to-right order starting at column 1.
🔹 Example 2: Read Parquet File with Explicit Schema
Parquet files support named fields. You don’t need to specify positions — but your column names in the WITH
clause must match the actual schema inside the Parquet file.
SELECT *
FROM OPENROWSET(
BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/titanic.parquet',
FORMAT = 'PARQUET'
) WITH (
Sex VARCHAR(10), -- Gender
Age FLOAT, -- Age
Name VARCHAR(200) -- Full Name
) AS rows;
Tip: If the column names do not match exactly, the result will show NULLs.
🔹 Example 3: Parse and Read Values from a JSON File
This method lets you extract values from NDJSON (newline-delimited JSON) using OPENROWSET
combined with OPENJSON
.
SELECT
jsonContent,
jsonData.customer_id,
jsonData.name,
jsonData.email
FROM OPENROWSET(
BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/sample_customers.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR = '0x0b',
ROWTERMINATOR = '0x0a'
) WITH (
jsonContent VARCHAR(MAX) -- Read JSON as text blob
) AS raw
CROSS APPLY OPENJSON(jsonContent)
WITH (
customer_id INT,
name VARCHAR(100),
email VARCHAR(100)
) AS jsonData;
Explanation: This approach tricks Synapse into treating the whole line as a single string using 0x0b
delimiters, allowing JSON parsing with OPENJSON()
.
📺 Watch the Full Video Tutorial
Here’s a complete walkthrough demo on YouTube:
This blog post was created with assistance from ChatGPT and Gemini AI to ensure technical clarity and accuracy.
No comments:
Post a Comment